用python爬取快手评论（烂活新整）

前提，上次我用selenium写了一个抖音直播评论获取，这次烂活新整，用python发送post请求获取快手的视频评论！

1.首先打开网页版的快手

在网页里面按下F12，打开开发者模式，点击网络，查看Fetch/XHR。看看里面的请求。

找到一个叫graphql的请求。这个就是评论的请求。我们点击进去，然后查看预览。可以看到如下效果。

返回的是一个json数据，这下就好办了。我们现在只要模仿浏览器给快手服务器发送请求就行了。

2.发送请求给快手服务器

首先，我们来看一下这个是个什么请求，是个get请求还是post请求。（一般都是post）点击表头。查看请求。

我们可以看到，这是个post请求，请求url是https://www.kuaishou.com/graphql，发送post请求都是要带一些表单数据的。

看看表单数据要传输哪些数据

可以看到这个请求负载里的数据就是我们要传输的数据了，这个请求负载不同于Form data的。Form data传输的一个字典，而 Request Payload是一个json数据。这个有所不同，如果你此时还是传入一个dict数据的话，是不会返回数据的。必须传入json数据。那么我现在就可以用python发送请求了。做到这里一切都很顺利！！

3.用python发送请求

import requests
import json
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36 Edg/94.0.992.31','Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_1e83306028e4691a0acb7f78ecfbf145; didv=1632459485000; client_key=65890b29; Hm_lvt_86a27b7db2c5c0ae37fee4a8a35033ee=1632459683; userId=153866369; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABSkrOjzdufZa4lBYXxddWpU-Da_CebI6GVHm2vvSBBxRUv9kJ_BKaJ3weX3FWf3acdLw2yy6uCM1MpHB8Pfi7EnJBlcRb0GqyXYMpPlaCdMqKlVo0PI-iY9zN0wdxA89wOXtml-fWR7CFTT54hsd3PZTsTwlWBA7vhJvwim07-A1RaTmwi66PbBkj7eCrkmJX0hdCln9MiFVyA_CL44TScBoSuDcrlwmr6APhXfdZrBO5uo0FIiA8xE-BzwPs3Wp_Q9mI4y5GcZxo1E-B0xr4CkR4zohqJigFMAE; kuaishou.server.web_ph=eeb2272fa742f8d8f79bd635c3e8044ac1f8'}
data = {"operationName": "commentListQuery","variables": {"photoId": "3xhcrvum26yz3vg","pcursor": ''},"query": "query commentListQuery($photoId: String, $pcursor: String) {\n visionCommentList(photoId: $photoId, pcursor: $pcursor) {\n   commentCount\n   pcursor\n   rootComments {\n     commentId\n     authorId\n     authorName\n     content\n     headurl\n     timestamp\n     likedCount\n     realLikedCount\n     liked\n     status\n     subCommentCount\n     subCommentsPcursor\n     subComments {\n       commentId\n       authorId\n       authorName\n       content\n       headurl\n       timestamp\n       likedCount\n       realLikedCount\n       liked\n       status\n       replyToUserName\n       replyTo\n       __typename\n     }\n     __typename\n   }\n   __typename\n }\n}\n"
}
conment = requests.post('https://www.kuaishou.com/graphql', headers=headers, json=data)
conments = json.loads(conment.text)
print(conments)

此时，就一个成功返回json数据了。

但是，好像有个问题，只有28条，返回数据不全，what，fuck！，看来是遇到问题了。

没关系，我又重新运行了几次，还是不行。我又回到快手页面。又打开开发者模式。开始找问题。突然我发现，有好多个叫graphql的请求。此时，我就猜想，是不是这个是动态加载的，你将评论往下滑，它才会加载。取好几个graphql的请求数据进行对比，发现只有photoId和pcursor其中的数据不同。经过我的仔细查找发现，这个photoId就是这个视频的id，那个网页地址上都有。不同的视频photold才不同。行了，photold解决了，那pucursor呢，这个是个什么鬼？我看了几个graphql的传输的数据，除了第一个graphql的pucursor是空的。其他的都全是数字。然后我打印了那返回不全的json的数据。发现了一个惊人的秘密。pucursor里面数据是上一个graphql返回的数据里面最后一个人的id。解密了！！！，他是动态渲染的，pucursor接收上一个发评论的人的id来发送下一个请求，从而请求数据。

4.最后来请求完整数据

先写好主程序，最后来个递归就解决一切，评论顺利到手！！！

import requests
import json
import sys
lists = []
def text(w):ww =0list = []headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36 Edg/94.0.992.31','Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_1e83306028e4691a0acb7f78ecfbf145; didv=1632459485000; client_key=65890b29; Hm_lvt_86a27b7db2c5c0ae37fee4a8a35033ee=1632459683; userId=153866369; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABSkrOjzdufZa4lBYXxddWpU-Da_CebI6GVHm2vvSBBxRUv9kJ_BKaJ3weX3FWf3acdLw2yy6uCM1MpHB8Pfi7EnJBlcRb0GqyXYMpPlaCdMqKlVo0PI-iY9zN0wdxA89wOXtml-fWR7CFTT54hsd3PZTsTwlWBA7vhJvwim07-A1RaTmwi66PbBkj7eCrkmJX0hdCln9MiFVyA_CL44TScBoSuDcrlwmr6APhXfdZrBO5uo0FIiA8xE-BzwPs3Wp_Q9mI4y5GcZxo1E-B0xr4CkR4zohqJigFMAE; kuaishou.server.web_ph=eeb2272fa742f8d8f79bd635c3e8044ac1f8'}data = {"operationName": "commentListQuery","variables": {"photoId": "3xhcrvum26yz3vg","pcursor": str(w)},"query": "query commentListQuery($photoId: String, $pcursor: String) {\n visionCommentList(photoId: $photoId, pcursor: $pcursor) {\n   commentCount\n   pcursor\n   rootComments {\n     commentId\n     authorId\n     authorName\n     content\n     headurl\n     timestamp\n     likedCount\n     realLikedCount\n     liked\n     status\n     subCommentCount\n     subCommentsPcursor\n     subComments {\n       commentId\n       authorId\n       authorName\n       content\n       headurl\n       timestamp\n       likedCount\n       realLikedCount\n       liked\n       status\n       replyToUserName\n       replyTo\n       __typename\n     }\n     __typename\n   }\n   __typename\n }\n}\n"
}conment = requests.post('https://www.kuaishou.com/graphql', headers=headers, json=data)conments = json.loads(conment.text)#print(conments['data'])s = len(conments['data']['visionCommentList']['rootComments']) - 1#print(conments['data']['visionCommentList']['rootComments'][s]['commentId'])for ii in range(0,s):#print(len(conments['data']['visionCommentList']['rootComments']))#print(conments['data']['visionCommentList']['rootComments'][s])print(conments['data']['visionCommentList']['rootComments'][ii]['content'])www = conments['data']['visionCommentList']['rootComments'][ii]['content']w = conments['data']['visionCommentList']['rootComments'][ii]['commentId']if len(lists)!=-1 :lists.append(www)print('111111')if len(lists)!=1:a = lists[-2]if lists[-1] == a and len(lists) != 1:print("评论已经爬完")sys.exit()for a in range(0,len(list)):ww +=list[a]print(ww)return w
w = ''
while w!=1:w = text(w)

兄弟们，今天就到这里了，剩余就靠你们自己发挥了，（偷偷告诉你们哦，这个不止可以对快手哦，好多网站都可以这样）。但是还是遵守法律法规。下次教兄弟们怎么用python群发消息，也是post请求哦！