快手视频以及评论获取

需要安装的第三方库

requests、json、xlwt

第一步：

进入快手网页版并登录，找到需要下载的视频用户，并进入主页，按F12进入开发者模式。刷新并点击XHR类型：

第二步：

左侧Name下的graphql就是储存用户以及视频的地方，顺序不一定，但大多数的视频信息储存在第二个graphql中

点击Preview就可以看到视频的信息，既然找到了，我们只需把它弄出来。但是我们发现每个graphql返回的url是相同的，所以我们不能用.get方法获取任何信息。因此，我们需要向url中传递一定的信息，才能返回我们需要的东西

而我们所需要传递的信息都在graphql的headers中（最底下）

！！！cookie的值需要登录后获取

#头部信息
headers = {"Cookie":"","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
}
#向浏览器传递的信息
post = {"operationName":"visionProfilePhotoList","variables":{"userId":userId,"pcursor":pcursor,"page":"profile"},"query":"query visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {\n  visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {\n    result\n    llsid\n    webPageArea\n    feeds {\n      type\n      author {\n        id\n        name\n        following\n        headerUrl\n        headerUrls {\n          cdn\n          url\n          __typename\n        }\n        __typename\n      }\n      tags {\n        type\n        name\n        __typename\n      }\n      photo {\n        id\n        duration\n        caption\n        likeCount\n        realLikeCount\n        coverUrl\n        coverUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrl\n        liked\n        timestamp\n        expTag\n        animatedCoverUrl\n        stereoType\n        videoRatio\n        profileUserTopPhoto\n        __typename\n      }\n      canAddComment\n      currentPcursor\n      llsid\n      status\n      __typename\n    }\n    hostName\n    pcursor\n    __typename\n  }\n}\n"
}

在这里有一个问题，就是原网页中的query中含有: ↵ 代表的意思就是换行。

query visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {↵  visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {↵    result↵    llsid↵    webPageArea↵    feeds {↵      type↵      author {↵        id↵        name↵        following↵        headerUrl↵        headerUrls {↵          cdn↵          url↵          __typename↵        }↵        __typename↵      }↵      tags {↵        type↵        name↵        __typename↵      }↵      photo {↵        id↵        duration↵        caption↵        likeCount↵        realLikeCount↵        coverUrl↵        coverUrls {↵          cdn↵          url↵          __typename↵        }↵        photoUrls {↵          cdn↵          url↵          __typename↵        }↵        photoUrl↵        liked↵        timestamp↵        expTag↵        animatedCoverUrl↵        stereoType↵        videoRatio↵        profileUserTopPhoto↵        __typename↵      }↵      canAddComment↵      currentPcursor↵      llsid↵      status↵      __typename↵    }↵    hostName↵    pcursor↵    __typename↵  }↵}↵

如果这样输入的话就会报错，因此我们可以用字符串的.replace方法（或者自己一个一个换）将"↵"替换成"\n"。

str = "query visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {↵  visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {↵    result↵    llsid↵    webPageArea↵    feeds {↵      type↵      author {↵        id↵        name↵        following↵        headerUrl↵        headerUrls {↵          cdn↵          url↵          __typename↵        }↵        __typename↵      }↵      tags {↵        type↵        name↵        __typename↵      }↵      photo {↵        id↵        duration↵        caption↵        likeCount↵        realLikeCount↵        coverUrl↵        coverUrls {↵          cdn↵          url↵          __typename↵        }↵        photoUrls {↵          cdn↵          url↵          __typename↵        }↵        photoUrl↵        liked↵        timestamp↵        expTag↵        animatedCoverUrl↵        stereoType↵        videoRatio↵        profileUserTopPhoto↵        __typename↵      }↵      canAddComment↵      currentPcursor↵      llsid↵      status↵      __typename↵    }↵    hostName↵    pcursor↵    __typename↵  }↵}↵"
str_end = str.replace("↵",r"\n")#结果为：
str_end = "query visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {\n  visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {\n    result\n    llsid\n    webPageArea\n    feeds {\n      type\n      author {\n        id\n        name\n        following\n        headerUrl\n        headerUrls {\n          cdn\n          url\n          __typename\n        }\n        __typename\n      }\n      tags {\n        type\n        name\n        __typename\n      }\n      photo {\n        id\n        duration\n        caption\n        likeCount\n        realLikeCount\n        coverUrl\n        coverUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrl\n        liked\n        timestamp\n        expTag\n        animatedCoverUrl\n        stereoType\n        videoRatio\n        profileUserTopPhoto\n        __typename\n      }\n      canAddComment\n      currentPcursor\n      llsid\n      status\n      __typename\n    }\n    hostName\n    pcursor\n    __typename\n  }\n}\n"

第三步：

通过requests.post方法获取所需要的信息，并用json.loads()方法将返回的信息转化成字典类型，方便信息的获取

FirstUrl = "https://www.kuaishou.com/graphql"#利用post方法获取网页信息
response = requests.post(url=FirstUrl,headers=headers,json=post)#利用json方法将返回的信息转化成字典形式
html = json.loads(response.text)

第四步：

获取视频的url后下载视频并保存在创建的文件夹中

创建文件夹：

h = r"F:\快手\\"
if not os.path.exists(h):os.mkdir(h)

下载视频：

    for url in GET_URL():video = requests.get(url).content#遍历视频名称列表，将保存的视频与名称对应for name in n:sum = sum + 1#打开创建的文件并写入with open(h+name+".mp4","wb") as f:f.write(video)#判断下载是否成功if True:print("第"+str(sum)+"个下载成功！")else:print("第"+str(sum)+"个下载失败！")#for name in n: 每次都会从列表n的第一个开始遍历，因此每下载一个，删除列表n的第一个值(确保每次能遍历到不同的名称)n.pop(0)#下载完成后跳出此循环，进行下一个的下载break

第五步：

获取评论内容（有评论内容以及评论回复，这里不细说，稍后源代码奉上）

评论同样储存在graphql中，但是我们需要随便点开一个视频，于是就有新的graphql弹出来：

利用与获取视频信息相同的方法获取评论信息即可，在这里只给大家看一下需要向url传递的信息，传递信息后用post方法获取评论信息，同样用json处理返回的信息并获取。

源代码如下：

#------快手信息获取------
#coding=utf-8
import time
import requests
import os
import json
import xlwtuserId = input("请输入userID：")
pcursor = input("请输入pcursor：")
page = input("请输入excal刷新识别数(整数)：")
#创建或打开文件夹h = r"F:\快手\\"
if not os.path.exists(h):os.mkdir(h)#头部信息
headers = {"Cookie":"kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_59821def1ca2acde8067d80822b7f86c; userId=2428747385; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABWIlXaYvq9DAtGoNKdgI3WBoWkDqkXMVZ1hOBmFke5aNG334pxKwEWXzRfHz1GvxeJtARKibURbo6S4PxirLF8n4XrsSLhon9VdfFjn35p7eyINL8GRFCeMd1wUhIFxxhk-LXkSwmwN1_lZnwnfAMlgimCPn7ZLqYJts9RxNxBrQg80cHSPYLIF2yIqkB2HtKsNhsJOZ0eBsIM204FwWRMRoScSev-lUTUQRM3QfVIiGGGpa1IiBFlVxlny2X_A6IIUiaC3mEBwsRlhIvfYUmU8M2OERVeygFMAE; kuaishou.server.web_ph=e882275ed97aff8bc584a342599794b5ad0d","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
}
#向浏览器传递的信息
post = {"operationName":"visionProfilePhotoList","variables":{"userId":userId,"pcursor":pcursor,"page":"profile"},"query":"query visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {\n  visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {\n    result\n    llsid\n    webPageArea\n    feeds {\n      type\n      author {\n        id\n        name\n        following\n        headerUrl\n        headerUrls {\n          cdn\n          url\n          __typename\n        }\n        __typename\n      }\n      tags {\n        type\n        name\n        __typename\n      }\n      photo {\n        id\n        duration\n        caption\n        likeCount\n        realLikeCount\n        coverUrl\n        coverUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrl\n        liked\n        timestamp\n        expTag\n        animatedCoverUrl\n        stereoType\n        videoRatio\n        profileUserTopPhoto\n        __typename\n      }\n      canAddComment\n      currentPcursor\n      llsid\n      status\n      __typename\n    }\n    hostName\n    pcursor\n    __typename\n  }\n}\n"
}FirstUrl = "https://www.kuaishou.com/graphql"#利用post方法获取网页信息
response = requests.post(url=FirstUrl,headers=headers,json=post)#利用json方法将返回的信息转化成字典形式
html = json.loads(response.text)
#获取第一层(字典类型)
First_message = html.get("data")
#获取第二层(字典类型)
Second_message = First_message.get("visionProfilePhotoList")
#获取第三层(列表类型)
Third_message = Second_message.get("feeds")#遍历列表，将列表拆开#获取视频名称
def GET_NAME():a = 1Name_list = []for Onemessage in Third_message:Get_Video_name = Onemessage.get("photo")One_name = Get_Video_name.get("caption")# 因为视频名称中有换行，识别不了，所以用replace方法将"\n"替换成任意字符One_name_1 = One_name.replace("\n", "")#视频名称查重，若重复则会覆盖上一个已经下载的视频if One_name_1 not in Name_list:Name_list.append(One_name_1)else:Name_list.append(One_name_1+str(a))a = a+1return Name_list#获取视频链接
def GET_URL():#创建空列表，将得到的url存放进去Url_list = []for Onemessage in Third_message:Get_url = Onemessage.get("photo")Url = Get_url.get("photoUrl")Url_list.append(Url)#返回得到的url列表return Url_list#下载视频
def Download_video():#设置计数器sum = 0#用n接收GET_NAME，防止每次遍历都从头开始n = GET_NAME()#遍历循环列表中的url，用get方法获取信息for url in GET_URL():video = requests.get(url).content#遍历视频名称列表，将保存的视频与名称对应for name in n:sum = sum + 1#打开创建的文件并写入with open(h+name+".mp4","wb") as f:f.write(video)#判断下载是否成功if True:print("第"+str(sum)+"个下载成功！")else:print("第"+str(sum)+"个下载失败！")#for name in n: 每次都会从列表n的第一个开始遍历，因此每下载一个，删除列表n的第一个值(确保每次能遍历到不同的名称)n.pop(0)#下载完成后跳出此循环，进行下一个的下载breakdef Get_realLikeCount():for datas in Third_message:Up_realLikeCount = datas.get("photo")#遍历获取每一条视频的点赞数量realLikeCount = Up_realLikeCount.get("realLikeCount")def Get_comment():headers_comment = {"Cookie": "kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_59821def1ca2acde8067d80822b7f86c; userId=2428747385; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABtI4bLUUjuMfNUEUaJXTxyRhtz5BDihYiVIwEfIm8ob0Tlb_wzKD7jpTK_sNDa9fBXVWbVX44OsbnSRsLW7IHwIY3LtBnPg-la24sXJ4B3QZXclUYyNNq-SFVmNC1ebXGV5_JusHsiNdwTcnmY_RAXUAzUoTjD3hNyeH9X7WP2h_mIofS3ifv-TEYybfVD35nh9G8UyBSfzKPLMcrDlyXIhoS7YoRGiN2PM_7zCD1Dj9m5oYoIiB9s5tmCfcl6X3tJNcAP9-YUs951NH3q3Vo1DE_E6PwVygFMAE; kuaishou.server.web_ph=7e511b27f7813874f11702ed1d7d0b73df13","User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"}#创建空列表用来存储所获取的信息First_list_names = []First_list_texts = []First_comment_distribution = []Second_comment_distribution = []#遍历获取信息for s in Third_message:commentCount_url_1 = s.get("photo")#获取评论所在的url  idcommentCount_url = commentCount_url_1.get("id")post_comment = {"operationName": "commentListQuery","variables": {"photoId": commentCount_url,"pcursor": ""},"query": "query commentListQuery($photoId: String, $pcursor: String) {\n  visionCommentList(photoId: $photoId, pcursor: $pcursor) {\n    commentCount\n    pcursor\n    rootComments {\n      commentId\n      authorId\n      authorName\n      content\n      headurl\n      timestamp\n      likedCount\n      realLikedCount\n      liked\n      status\n      subCommentCount\n      subCommentsPcursor\n      subComments {\n        commentId\n        authorId\n        authorName\n        content\n        headurl\n        timestamp\n        likedCount\n        realLikedCount\n        liked\n        status\n        replyToUserName\n        replyTo\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n"}#用post方法返回评论所在网页的信息response_comment = requests.post(url=FirstUrl,headers=headers_comment,json=post_comment)#用json方法将得到的信息处理成字典形式，再用.get方法获取信息comment_html = json.loads(response_comment.text)comment_text_1 = comment_html.get("data")comment_text_2 = comment_text_1.get("visionCommentList")comments = comment_text_2.get("rootComments")#将每个视频的评论数储存在列表中First_comment_distribution.append(len(comments))#遍历获取评论的用户名以及评论信息for comment in comments:First_comment_name = comment.get("authorName")First_comment_text = comment.get("content")comment_second_text = comment.get("subComments")#将得到的信息储存First_list_names.append(First_comment_name)First_list_texts.append(First_comment_text)#获取每条评论的回复数量Second_comment_distribution.append(len(comment_second_text))#创建空列表储存评论回复的信息Second_list_names = []Second_list_texts = []#遍历获取评论回复信息for comment_second in comment_second_text:Second_comment_name = comment_second.get("authorName")Second_comment_text = comment_second.get("content")#储存回复信息Second_list_names.append(Second_comment_name)Second_list_texts.append(Second_comment_text)#将得到的所有信息返回return First_list_names,First_list_texts,Second_list_names,Second_list_texts,First_comment_distribution,Second_comment_distributiondef Write_excal():write_type = list(Get_comment())write_name = write_type[0]write_text = write_type[1]write_lists = []for i in range(len(write_name)):write_lists.append(write_name[i])write_lists.append(write_text[i])n = 2for p in range(0, len(write_lists), n):write_list = write_lists[p:p + n]write_list_end = [write_lists[p:p + n] for p in range(0, len(write_lists), n)]file = xlwt.Workbook()sheet1 = file.add_sheet(u'评论信息', cell_overwrite_ok=True)rowTitle = [u'编号', u'用户名称', u'评论信息']rowDatas = write_list_endfor j in range(len(rowTitle)):sheet1.write(0, j, rowTitle[j])for k in range(len(rowDatas)):rowDatas[k].insert(0, k + 1)for y in range(len(rowDatas[k])):sheet1.write(k + 1, y, rowDatas[k][y])file.save("F:\python爬虫\爬虫实战\快手\ "+"快手评论"+str(page)+".xlsx")def Pr_output():prtype = list(Get_comment())# 一层评论用户名称First_names = prtype[0]# 一层评论内容First_texts = prtype[1]# 二层评论用户名称Second_names = prtype[2]# 二层评论内容Second_texts = prtype[3]# 二层评论分布Second_nums = prtype[5]# 一层评论分布sums = prtype[4]#创建空列表储存"每一条"视频的评论以及用户名称信息c = []d = []#借助sums(一层评论分布)将每一个视频的评论划分开for i in sums:'''用切片方法截取每一条视频的评论以及名称为了防止重复遍历，将之前遍历过的用del方法删除将得到的信息储存在创建的列表中'''c.append(First_texts[0:i])del First_texts[0:i]d.append(First_names[0:i])del First_names[0:i]for j in range(len(GET_NAME())):#获取每一条视频的每一个评论以及用户名names = d[j]texts = c[j]#判断评论是否为空,这里用names与texts都可以，他们俩是一一对应的if names != []:print("第" + str(j + 1) + "个作品的评论如下：")#遍历输出每一个评论以及所对应的用户名for k in range(len(names)):print(names[k] + ":  " + texts[k])#利用Second_nums的数值判断每一个评论下是否含有回复，值为几则含有几个回复for s in Second_nums:if s == 0:print("    评论无回复")else:#用x限制输出的回复数量(输出的数量需要与Second_nums的值对应)x = 0#遍历输出回复的用户名称与回复的信息for r in range(len(Second_texts)):print("    评论回复-" + "用户名称:" + Second_names[r] + "    评论信息:" + Second_texts[r])x = x + 1#当输出的数量等于Second_nums对应的值时，跳出循环if x == Second_nums[0]:break#用del删除之前输出过的信息防止重复遍历del Second_texts[0:x]del Second_names[0:x]#更新Second_nums值的信息(删除之间遍历过的第一个值，并跳出循环，让循环重新看开始)Second_nums.pop(0)breakelse:print("第" + str(j + 1) + "个作品无评论")print("\n")if __name__ == '__main__':#设置程序开始的时间start = time.perf_counter()#调用函数GET_NAME()GET_URL()print("开始下载视频".center(100, "-"))Download_video()print("下载结束".center(100, "-"))print("\n")#设置程序暂停两秒后运行time.sleep(2)Get_comment()print("输出评论信息".center(100, "-"))Pr_output()print("输出结束".center(100, "-"))print("\n")print("正在生成excal".center(100,"-"))Write_excal()print("excal生成结束".center(100,"-"))#计算程序运行的总时间t = time.perf_counter() - startprint("总共用时：{:.2f}秒".format(t))

说明：

一个graphql中能够获取的视频信息是有限的，若作者视频较多，则需要将页面向下滑动，刷新出新的视频，graphql也会随之出现，利用同样的方法即可获取。评论也是如此！

快手视频以及评论获取相关推荐

快手视频艾特实操教学分享，什么是艾特脚本，评论区艾特引流脚本讲解！
大家好我是你们的小编一辞脚本,今天给大家分享新的知识,很开心可以在CSDN平台分享知识给大家,很多伙伴看不到代码我先录制一下视频在给大家做代码,给大家分享一下快手艾特脚本的知识和视频演示不懂的 ...
Java解析快手视频去水印教程--2020年10月最新有效代码
实现代码去水印视频,目前只能适用于2020年以前发的快手视频,大家可以评论研究新的办法 import cn.hutool.core.map.MapUtil; import cn.hutool.htt ...
快手接口分析(一)——cookie获取
以下几篇文章主要围绕快手网页版协议接口分析. 主要包括: 登录cookie获取关注点赞视频数据评论数据开篇先将一部分结论进行说明: 通过协议进行批量关注(一个账号一天最多关注用户120) 批 ...
Python爬取《明日之子》（腾讯视频）评论和弹幕，看看大家都在吐槽什么
微信公众号原文链接今年夏天各电视和视频平台同步播出了许多音乐综艺节目,包括热度较高的<乐队的夏天>.<明日之子>.<声入人心>.<我是唱作人>.< ...
使用python进行视频地址的获取，下载
使用python进行视频地址的获取,下载 python小白第一次使用python,语句不规范处大神们多多包涵首先思路是仿照网上的大婶们进行的代码书写(视频为部分vip视频) 内容仅供学习交流首先就 ...
网络爬虫---抓包分析，用抓包分析爬取腾讯视频某视频所有评论（Fiddler工具包的分享）
抓包分析,用抓包分析爬取腾讯视频某视频所有评论(Fiddler工具包的分享) 文章目录抓包分析,用抓包分析爬取腾讯视频某视频所有评论(Fiddler工具包的分享) 一.抓包分析 1.下载工具并安装 ...
vue+django 微博舆情系统源码、深度学习+舆情扩散消失分析、舆情紧急等级、属地分析、按话题、情感预测、话题评论获取、提取观点、正面负面舆情、按区域检测舆情
项目背景 315又马上要到了,现在有开始对食品安全话题的关注地提升了,因此,本文系统对微博的食品安全话题进行分析,有如下的功能 1.展示当前食品安全事件相关的热点信息以及提供根据食品关键词,食品安全类 ...
python 快手视频去水印批量下载
import requests from bs4 import BeautifulSoup import re import xlrdclass KSNoMark():# 忽略警告代码requests ...
python下载某短视频平台音视频、评论、点赞数（详细教程，文末附完整代码
前言整理一下python在抖音网页版的应用以抓取一个视频及评论为例进行讲解获取视频 url 随便找一个视频,点击进入详情,页面中地址栏里就是该视频的 url . 手机端复制链接后,先粘到浏览器地 ...
pyQT 视频播放器（三）实现视频截图、获取每一帧数据
pyQT 视频播放器(三) 实现视频截图.获取每一帧数据背景方法调研详细代码说明最终效果总结: 参考资料背景在 "PyQt5 实现视频播放器(二) ,详细版本 ,适合新手入门& ...

快手视频以及评论获取

快手视频以及评论获取相关推荐

最新文章

热门文章