先上完整代码

1 importrequests2 importtime3 importdatetime4 importos5 importjson6 importuuid7 from pyquery importPyQuery as pq8

9 #地址 https://www.zhihu.com/question/34243513

10 defstart(offset, sort):11 url = 'https://www.zhihu.com/api/v4/questions/34243513/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit=5&offset=' +str(12 offset) + '&platform=desktop&sort_by=' +sort13 headers ={14 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',15 'cookie': '_zap=d62a474a-43dc-450a-bdcc-cb807357f4ab; _xsrf=3999fea2-0356-42e7-8d2a-ac968d594c86; d_c0="AEDkV8GlCw-PTjUrBeB6hDjqwntD_bakBL8=|1551259687";q_c1=7d117a8c0f564a3ea2a20d02f83e3bb1|1551259688000|1551259688000; tgw_l7_route=66cb16bc7f45da64562a077714739c11'

16 }17 res = requests.get(url, headers=headers).text18 data =json.loads(res)19 picRepo = 'picRepo'

20 if notos.path.exists(picRepo):21 os.makedirs(picRepo)22 if data.get("data"):23 for i, item in enumerate(data['data']):24 content = pq(item['content'])25 imgUrls = content.find('noscript img').items()26 for imgTag inimgUrls:27 src = imgTag.attr("src")28 strIndex = src.rfind('.')29 suffix =src[strIndex:]30 with open(f"{picRepo}/{uuid.uuid4()}{suffix}", 'wb') as f:31 f.write(requests.get(src).content)32

33

34 if __name__ == '__main__':35 starttime =datetime.datetime.now()36 strTime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')37 stime = print(f'开始抓取,当前时间: {strTime}')38 for i in range(3): #偏移量单位是5,循环3次

39 start(offset=((i * 5) if i != 0 else 0), sort='updated') #updated:按时间降序,default:默认排序

40 #这里先睡一会,如果太快可能有些图片下载后会查看不了,

41 #越慢,下载的图片可以查看的越多。原因大概是知乎的反爬虫机制,

42 #看不了的图片其实返回的是一个400的badRequest的状态码

43 time.sleep(3)44 endtime =datetime.datetime.now()45 print(f'抓取完毕,用时 {((endtime - starttime).seconds)} 秒:')

View Code

如果遇到以下问题,只要将py脚本转一下编码就行了

SyntaxError: Non-UTF-8 code starting with '\xbf' in file python-zhihu -v1.2.py on line 34, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

前期准备工作:

1,安装python3

2,安装所需要的第三方模块

安装需要的模块

pip install requests

pip install PyQuery

pip show 命令检查模块是否安装成功(如图所示是成功的)

C:\Users\addiction\Desktop\zhihu-take>pip show requests

Name: requests

Version:2.21.0

Summary: Python HTTPforHumans.

Home-page: http://python-requests.org

Author: Kenneth Reitz

Author-email: me@kennethreitz.org

License: Apache2.0Location: c:\users\addiction\appdata\local\programs\python\python37-32\lib\site-packages

Requires: chardet, certifi, urllib3, idna

Required-by:

----------------------------------------------------------

C:\Users\addiction\Desktop\zhihu-take>pip show PyQuery

Name: pyquery

Version: 1.4.0

Summary: A jquery-like library for python

Home-page: https://github.com/gawel/pyquery

Author: Gael Pasgrimaud

Author-email: gael@gawel.org

License: BSD

Location: c:\users\addiction\appdata\local\programs\python\python37-32\lib\site-packages

Requires: cssselect, lxml

Required-by:

=====================================我是分割线===================================

以下是2018/09/14 09:29存的草稿,本篇记录并完善该草稿于2019/02/27

有一天逛知乎的时候, 首页被推送了这样一个问题:

由于之前看廖雪峰老师的python3教程, 了解了一些python基础语法知识. 然后又看了崔庆才老师的网络爬虫的书. 于是就想着用py3 将这个答案下的图片全部爬取下来, 这个答案下可是有六千多个回答呢

第一次用python实践现学现用 . 有些兴奋

这里把过程记录一下.

python3的安装就不说了.

首先打开浏览器在知乎搜索"你见过最漂亮的女生长什么样", 如果你没有知乎的账号, 可以直接在百度或者google上搜同样的关键字, 然后找到知乎点击进去 就是未登陆的状态也可以浏览这个问题下的答案了. 这里直接附上知乎链接:https://www.zhihu.com/question/34243513/answer/110939108

##########################split#####################################

简单说一下思路:

找到加载答案数据的请求;

分析请求链接中的参数;

根据不同的参数循环发送请求获取响应

从响应对象中取到图片的url地址

根据rul地址用python以二进制的方式写入到本地磁盘

一 , 找到加载答案数据的请求

我们进入这个问题下 , F12打开控制台, 看network那一栏, 发现又很多请求, 我找了很久才找到我想要的那一个请求, 后来找出了规律, 你只要看众多请求中size最大的那一个, 十有八九是加载这些答案数据的请求. 如下图:

点击我们找的请求, 右边找到"Request URL:" 这就是请求的完整地址了 , 如下图所示:

这个地方我之前犯了一个错误, 我直接爬取的链接是

所以不是所有的网站的数据都是直接取浏览器地址栏中的链接. 大部分数据是ajax异步请求来的.

二 , 分析请求链接中的参数

接下来我们分析参数, 我们找到的链接如下:

https://www.zhihu.com/api/v4/questions/34243513/answers?

include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2

Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2

Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2

Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2

Cis_labeled%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit=5&offset=5

&platform=desktop&sort_by=default

这里有很多参数, 那么我们只看几个加粗着色的重要的参数就好了. 三个参数

limit: 看参数名字应该是一页中显示的数据条数

offset: 看名字可以判断是页面的偏移量

sort_by: 看名字是排序方式无疑了

这有一个细节, 当你第一次进入这个页面的时候 limit是 3 也就是说, 只展示3条数据, 当你点击类似于"加载更多" 的按钮的时候, limit和offset的值才会改变成5 . 知乎那个按钮大概长这样的:

没错, 还有最后一个要讲的参数叫 "sort_by" . 看名字自然是知道作用是根据啥啥啥排序. 可以看到它有个默认值就 "default ". 也就是默认排序. 作为一个知乎老透明er, 我当然注意到了知乎每个问题下的右上角有一个"默认排序"的点击操作, 值有二. 如图:

排序参数值"default" 自然就对应的是 "默认排序"也就是按赞数降序, 我点击按时间排序, 发现是按日期降序.同一日期中按时间的时分降序 . 如果你看够了赞数高的那些图片,想看最新发布的内容 那么你只需要改一下"sort_by" 这个参数就ok了 , 如图 , 按时间排序的参数值是"created"

=======================================

url和参数都确认之后设置好请求头, 试着用requests 发一条请求并转成json数据

url = 'https://www.zhihu.com/api/v4/questions/34243513/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit=5&offset=' +str(

offset)+ '&platform=desktop&sort_by=' +sort

headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36','cookie': '_zap=d62a474a-43dc-450a-bdcc-cb807357f4ab; _xsrf=3999fea2-0356-42e7-8d2a-ac968d594c86; d_c0="AEDkV8GlCw-PTjUrBeB6hDjqwntD_bakBL8=|1551259687";q_c1=7d117a8c0f564a3ea2a20d02f83e3bb1|1551259688000|1551259688000; tgw_l7_route=66cb16bc7f45da64562a077714739c11'}

res= requests.get(url, headers=headers).text

data= json.loads(res)

接下来就分两个步骤就好了, 第一步是解析data这个json格式的数据找到图片链接,第二步就是下载这个图片

data用json.loads转成json之后里面其实全是单引号这样在json.cn里面是无法格式化的,用pyCharm很容易就可以把它所有单引号改成双引号,

首先来解析一下请求到的data数据为了更清晰的查看并将它格式化一下,如图:

放一份完整的json数据如下

{"data": [

{"id": 608243858,"type": "answer","answer_type": "normal","question": {"type": "question","id": 34243513,"title": "你见过最漂亮的女生长什么样?","question_type": "normal","created": 1438930419,"updated_time": 1508210095,"url": "https://www.zhihu.com/api/v4/questions/34243513","relationship": {}

},"author": {"id": "53913d4b615d9b7e57a1a2a9a5088ec1","url_token": "475203","name": "铛铛","avatar_url": "https://pic4.zhimg.com/da8e974dc_is.jpg","avatar_url_template": "https://pic4.zhimg.com/da8e974dc_{size}.jpg","is_org": "False","type": "people","url": "https://www.zhihu.com/api/v4/people/53913d4b615d9b7e57a1a2a9a5088ec1","user_type": "people","headline": "","badge": [],"gender": 0,"is_advertiser": "False","follower_count": 0,"is_followed": "False","is_privacy": "False"},"url": "https://www.zhihu.com/api/v4/answers/608243858","is_collapsed": "False","created_time": 1551103483,"updated_time": 1551103493,"extras": "","is_copyable": "True","is_normal": "True","voteup_count": 0,"comment_count": 0,"is_sticky": "False","admin_closed_comment": "False","comment_permission": "all","can_comment": {"reason": "","status": "True"},"reshipment_settings": "allowed","content": "

应该是第一次回答吧,不喜绕道,不接受难听的负面评价,因为她是我妹啊~

纯天然,哪都没动过,她妈妈我婶婶年轻时候更美,超美~

我手机里她的照片不多,但是我觉得她本人不说更好看,但有更多种好看的样子。

四大邪术,还是用了滤镜




","editable_content": "","excerpt": "应该是第一次回答吧,不喜绕道,不接受难听的负面评价,因为她是我妹啊~纯天然,哪都没动过,她妈妈我婶婶年轻时候更美,超美~我手机里她的照片不多,但是我觉得她本人不说更好看,但有更多种好看的样子。四大邪术,还是用了滤镜 ","collapsed_by": "nobody","collapse_reason": "","annotation_action": [],"mark_infos": [],"relevant_info": {"is_relevant": "False","relevant_type": "","relevant_text": ""},"suggest_edit": {"reason": "","status": "False","tip": "","title": "","unnormal_details": {"status": "","description": "","reason": "","reason_id": 0,"note": ""},"url": ""},"is_labeled": "False","reward_info": {"can_open_reward": "False","is_rewardable": "False","reward_member_count": 0,"reward_total_money": 0,"tagline": ""},"relationship": {"is_author": "False","is_authorized": "False","is_nothelp": "False","is_thanked": "False","voting": 0,"upvoted_followees": []

}

},

{"id": 608062810,"type": "answer","answer_type": "normal","question": {"type": "question","id": 34243513,"title": "你见过最漂亮的女生长什么样?","question_type": "normal","created": 1438930419,"updated_time": 1508210095,"url": "https://www.zhihu.com/api/v4/questions/34243513","relationship": {}

},"author": {"id": "acdca222c15c201469fa6d5e09c8ec01","url_token": "heck-96","name": "大演说家","avatar_url": "https://pic2.zhimg.com/v2-e6274a52c7dd8f576ad5259115cdf604_is.jpg","avatar_url_template": "https://pic2.zhimg.com/v2-e6274a52c7dd8f576ad5259115cdf604_{size}.jpg","is_org": "False","type": "people","url": "https://www.zhihu.com/api/v4/people/acdca222c15c201469fa6d5e09c8ec01","user_type": "people","headline": "FAE","badge": [],"gender": 1,"is_advertiser": "False","follower_count": 2,"is_followed": "False","is_privacy": "False"},"url": "https://www.zhihu.com/api/v4/answers/608062810","is_collapsed": "False","created_time": 1551088503,"updated_time": 1551088515,"extras": "","is_copyable": "True","is_normal": "True","voteup_count": 0,"comment_count": 0,"is_sticky": "False","admin_closed_comment": "False","comment_permission": "all","can_comment": {"reason": "","status": "True"},"reshipment_settings": "allowed","content": "

这是我见过最好看的女的,没有之一


","editable_content": "","excerpt": "这是我见过最好看的女的,没有之一","collapsed_by": "nobody","collapse_reason": "","annotation_action": [],"mark_infos": [],"relevant_info": {"is_relevant": "False","relevant_type": "","relevant_text": ""},"suggest_edit": {"reason": "","status": "False","tip": "","title": "","unnormal_details": {"status": "","description": "","reason": "","reason_id": 0,"note": ""},"url": ""},"is_labeled": "False","reward_info": {"can_open_reward": "False","is_rewardable": "False","reward_member_count": 0,"reward_total_money": 0,"tagline": ""},"relationship": {"is_author": "False","is_authorized": "False","is_nothelp": "False","is_thanked": "False","voting": 0,"upvoted_followees": []

}

},

{"id": 607635090,"type": "answer","answer_type": "normal","question": {"type": "question","id": 34243513,"title": "你见过最漂亮的女生长什么样?","question_type": "normal","created": 1438930419,"updated_time": 1508210095,"url": "https://www.zhihu.com/api/v4/questions/34243513","relationship": {}

},"author": {"id": "7cec8189ce0617bf66915bd7075c3c2a","url_token": "gwebuehyo","name": "GweBuehyo","avatar_url": "https://pic4.zhimg.com/v2-1f904e8ce52987d90cd110ce10594352_is.jpg","avatar_url_template": "https://pic4.zhimg.com/v2-1f904e8ce52987d90cd110ce10594352_{size}.jpg","is_org": "False","type": "people","url": "https://www.zhihu.com/api/v4/people/7cec8189ce0617bf66915bd7075c3c2a","user_type": "people","headline": "","badge": [],"gender": 1,"is_advertiser": "False","follower_count": 0,"is_followed": "False","is_privacy": "False"},"url": "https://www.zhihu.com/api/v4/answers/607635090","is_collapsed": "False","created_time": 1551058547,"updated_time": 1551058547,"extras": "","is_copyable": "True","is_normal": "True","voteup_count": 0,"comment_count": 0,"is_sticky": "False","admin_closed_comment": "False","comment_permission": "all","can_comment": {"reason": "","status": "True"},"reshipment_settings": "allowed","content": "

我这辈子见过最好看的,是一个路人。话不多说,上图


有些人一辈子只能见一面,555555

","editable_content": "","excerpt": "我这辈子见过最好看的,是一个路人。话不多说,上图有些人一辈子只能见一面,555555","collapsed_by": "nobody","collapse_reason": "","annotation_action": [],"mark_infos": [],"relevant_info": {"is_relevant": "False","relevant_type": "","relevant_text": ""},"suggest_edit": {"reason": "","status": "False","tip": "","title": "","unnormal_details": {"status": "","description": "","reason": "","reason_id": 0,"note": ""},"url": ""},"is_labeled": "False","reward_info": {"can_open_reward": "False","is_rewardable": "False","reward_member_count": 0,"reward_total_money": 0,"tagline": ""},"relationship": {"is_author": "False","is_authorized": "False","is_nothelp": "False","is_thanked": "False","voting": 0,"upvoted_followees": []

}

},

{"id": 574208958,"type": "answer","answer_type": "normal","question": {"type": "question","id": 34243513,"title": "你见过最漂亮的女生长什么样?","question_type": "normal","created": 1438930419,"updated_time": 1508210095,"url": "https://www.zhihu.com/api/v4/questions/34243513","relationship": {}

},"author": {"id": "0","url_token": "","name": "匿名用户","avatar_url": "https://pic1.zhimg.com/aadd7b895_is.jpg","avatar_url_template": "https://pic1.zhimg.com/aadd7b895_{size}.jpg","is_org": "False","type": "people","url": "https://www.zhihu.com/api/v4/people/0","user_type": "people","headline": "","badge": [],"gender": 1,"is_advertiser": "False","follower_count": 0,"is_following": "False","is_followed": "False","is_celebrity": "False","is_blocking": "False","is_blocked": "False","is_privacy": "False"},"url": "https://www.zhihu.com/api/v4/answers/574208958","is_collapsed": "False","created_time": 1547523070,"updated_time": 1551019993,"extras": "","is_copyable": "True","is_normal": "True","voteup_count": 0,"comment_count": 0,"is_sticky": "False","admin_closed_comment": "False","comment_permission": "all","can_comment": {"reason": "","status": "True"},"reshipment_settings": "allowed","content": "

超喜欢她,一个朋友













我的女神,爱死了






奈何已深陷







好害羞~


就应该是没有在一起的可能吧

","editable_content": "","excerpt": "超喜欢她,一个朋友 我的女神,爱死了 奈何已深陷 好害羞~就应该是没有在一起的可能吧","collapsed_by": "nobody","collapse_reason": "","annotation_action": [],"mark_infos": [],"relevant_info": {"is_relevant": "False","relevant_type": "","relevant_text": ""},"suggest_edit": {"reason": "","status": "False","tip": "","title": "","unnormal_details": {"status": "","description": "","reason": "","reason_id": 0,"note": ""},"url": ""},"is_labeled": "False","reward_info": {"can_open_reward": "False","is_rewardable": "False","reward_member_count": 0,"reward_total_money": 0,"tagline": ""},"relationship": {"is_author": "False","is_authorized": "False","is_nothelp": "False","is_thanked": "False","voting": 0,"upvoted_followees": []

}

},

{"id": 289133817,"type": "answer","answer_type": "normal","question": {"type": "question","id": 34243513,"title": "你见过最漂亮的女生长什么样?","question_type": "normal","created": 1438930419,"updated_time": 1508210095,"url": "https://www.zhihu.com/api/v4/questions/34243513","relationship": {}

},"author": {"id": "b77a087c065c67f376e84b0e4f59b478","url_token": "yimei-huo-po-de-luo-xuan","name": "Sponge","avatar_url": "https://pic1.zhimg.com/v2-8e111664b38dd0897d8ec6bed6fb4c7c_is.jpg","avatar_url_template": "https://pic1.zhimg.com/v2-8e111664b38dd0897d8ec6bed6fb4c7c_{size}.jpg","is_org": "False","type": "people","url": "https://www.zhihu.com/api/v4/people/b77a087c065c67f376e84b0e4f59b478","user_type": "people","headline": "请善用语言 使人言可敬\n正在追求马先生笔下的自由","badge": [],"gender": 0,"is_advertiser": "False","follower_count": 9,"is_followed": "False","is_privacy": "False"},"url": "https://www.zhihu.com/api/v4/answers/289133817","is_collapsed": "False","created_time": 1515060176,"updated_time": 1551000513,"extras": "","is_copyable": "True","is_normal": "True","voteup_count": 0,"comment_count": 0,"is_sticky": "False","admin_closed_comment": "False","comment_permission": "all","can_comment": {"reason": "","status": "True"},"reshipment_settings": "allowed","content": "

再次补充



补充回答
又得到了喜欢的颜的照片







其实不止一个...有习惯收美好照片激励自己,不止女生也有男生emmm...





还有很多没水印的没放【除了最后一张太喜欢这个妹子了】再放一张


她整个人的个性风格在我看来是最美的

因为可能偏题了
所以我就匿名了
怕被骂


.....
因为匿名了就看不到自己的答案了...
所以我又取匿了..恩..骂吧

....
突然想到我可以限制评论啊哈哈哈哈哈哈哈哈哈

","editable_content": "","excerpt": "再次补充 补充回答 又得到了喜欢的颜的照片 其实不止一个...有习惯收美好照片激励自己,不止女生也有男生emmm... 还有很多没水印的没放【除了最后一张太喜欢这个妹子了】再放一张 她整个人的个性风格在我看来是最美的因为可能偏题了 所以我就匿名了 怕被骂 …","collapsed_by": "nobody","collapse_reason": "","annotation_action": [],"mark_infos": [],"relevant_info": {"is_relevant": "False","relevant_type": "","relevant_text": ""},"suggest_edit": {"reason": "","status": "False","tip": "","title": "","unnormal_details": {"status": "","description": "","reason": "","reason_id": 0,"note": ""},"url": ""},"is_labeled": "False","reward_info": {"can_open_reward": "False","is_rewardable": "False","reward_member_count": 0,"reward_total_money": 0,"tagline": ""},"relationship": {"is_author": "False","is_authorized": "False","is_nothelp": "False","is_thanked": "False","voting": 0,"upvoted_followees": []

}

}

],"paging": {"is_end": "False","is_start": "True","next": "https://www.zhihu.com/api/v4/questions/34243513/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit=5&offset=5&platform=desktop&sort_by=updated","previous": "https://www.zhihu.com/api/v4/questions/34243513/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit=5&offset=0&platform=desktop&sort_by=updated","totals": 7643}

}

View Code

仔细看上面的数据很容易就可以发现知乎的回答的正文其实是在data中的content元素中如下图:

然后我们就可以解析content元素中的内容通过PyQuery来查找img标签并取得img标签的src属性也就是图片链接

picRepo = 'picRepo'

if notos.path.exists(picRepo):

os.makedirs(picRepo)if data.get("data"):for i, item in enumerate(data['data']):

content= pq(item['content'])

imgUrls= content.find('noscript img').items()for imgTag inimgUrls:

src= imgTag.attr("src")

strIndex= src.rfind('.')

suffix=src[strIndex:]

with open(f"{picRepo}/{uuid.uuid4()}{suffix}", 'wb') as f:

f.write(requests.get(src).content)

这样基本上就可以爬取知乎的一页数据了。 如果要爬取多条那么就用循环。

这里我将爬取图片的代码封装了一下,然后放在循环里面传入偏移量和排序方式的参数 如下:

if __name__ == '__main__':

starttime=datetime.datetime.now()

strTime= datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')

stime= print(f'开始抓取,当前时间: {strTime}')for i in range(3): #知乎偏移量单位是5所以用i*5,这里循环3次,从0开始

start(offset=((i * 5) if i != 0 else 0), sort='updated') #updated:按时间降序,default:默认排序

#看不了的图片其实返回的是一个400的badRequest的状态码

time.sleep(3)

endtime=datetime.datetime.now()print(f'抓取完毕,用时 {((endtime - starttime).seconds)} 秒:')

大功告成

以下是爬取的图片数据, 第一次写python,有问题的地方多多指点

完。

python爬虫知乎图片_python 爬取知乎图片相关推荐

  1. python爬虫实战(一)--爬取知乎话题图片

    原文链接python爬虫实战(一)–爬取知乎话题图片 前言 在学习了python基础之后,该尝试用python做一些有趣的事情了–爬虫. 知识准备: 1.python基础知识 2.urllib库使用 ...

  2. [python爬虫] BeautifulSoup和Selenium简单爬取知网信息测试

    作者最近在研究复杂网络和知识图谱内容,准备爬取知网论文相关信息进行分析,包括标题.摘要.出版社.年份.下载数和被引用数.作者信息等.但是在爬取知网论文时,遇到问题如下:   1.爬取内容总为空,其原因 ...

  3. python最新官网图片_python爬取福利网站图片完整代码

    存起来 自己学习... import requests,bs4,re,os,threading class MeiNvTu: def __init__(self): self.url_main='ht ...

  4. python爬取知乎话题_python爬取知乎话题图片

    前言 什么是网络爬虫(也叫网络蜘蛛)?简单来说,是一种用来自动浏览万维网程序或脚本(网络爬虫的典型应用就是我们所熟知的搜索引擎).既然如此,那么我们也可以写一个程序,用来自动浏览或者获取网页上的信息. ...

  5. python爬虫有道词典_Python爬取有道词典,有道的反爬很难吗?也就这样啊!

    前言 大家好 ​ 最近python爬虫有点火啊,啥python爬取马保国视频--我也来凑个热闹,今天我们来试着做个翻译软件--不是不是,说错了,今天我们来试着提交翻译内容并爬取翻译结果 主要内容 PS ...

  6. python爬虫猫眼电影票房_python爬取猫眼电影top100排行榜

    爬取猫眼电影TOP100(http://maoyan.com/board/4?offset=90) 1). 爬取内容: 电影名称,主演, 上映时间,图片url地址保存到mariadb数据库中; 2). ...

  7. python 爬虫餐饮行业 数据分析_Python爬取美团美食板块商家数据

    导语 利用Python简单爬取美团美食板块商家数据... 其实一开始我是想把美团的所有商家信息爬下来的,这样就可以美其名曰百万数据了... 然而相信很多爬过美团的朋友都发现了... 如果不进行一些小操 ...

  8. python爬虫淘宝评论_Python爬取淘宝店铺和评论

    1 安装开发需要的一些库 (1) 安装mysql 的驱动:在Windows上按win+r输入cmd打开命令行,输入命令pip install pymysql,回车即可. (2) 安装自动化测试的驱动s ...

  9. python爬虫免费代理池_Python爬取免费代理搭建代理池

    我们在做爬虫的过程中经常会遇到这样的情况:最初爬虫正常运行,正常抓取数据,一切看起来都是那么美好,然而一杯茶的功夫可能就会出现错误,比如403Forbidden:这时候网页上可能会出现 "您 ...

最新文章

  1. java中执行cmd命令_如何通过Java执行cmd命令
  2. linux系统读取第二个盘的数据,磁盘及文件系统管理—第二篇
  3. python模拟鼠标点击和键盘输入的操作_Python模拟鼠标点击及键盘输入(PyUserInput)...
  4. 汤家凤高等数学基础手写笔记-空间解析几何
  5. Stanford机器学习笔记-5.神经网络Neural Networks (part two)
  6. 【温故知新】CSS学习笔记(三大特性)
  7. HDU-1028 Ignatius and the Princess III(生成函数)
  8. SDUT - 2604 Thrall’s Dream(tarjan+拓扑)
  9. SAP Spartacus新建org unit之后,排序不正确的问题分析
  10. SAP CRM IBASE的archive方法
  11. 【转】spring之任务调度
  12. EIGRP and the OSPF redistribute
  13. 中间件和Django缓存
  14. 融入常识知识的生成式对话摘要
  15. java实验报告的原理_JAVA实验报告
  16. 【寻址方式】基地址与偏移地址的详细解释
  17. 基于Heka+Flume+Kafka+ELK的日志系统
  18. Word——加了项目符号后,第二行就与第一行对不齐的一种解决方法
  19. 基于Springboot的社区志愿者服务管理系统
  20. ADC标准 INLDNL(1)

热门文章

  1. android 百度转码,自适应网站移动端被百度转码解决方案
  2. 电信CALL 通话记录hbase kafka flume 学习
  3. 学计算机语言的最佳年龄,孩子学编程最佳年龄是几岁
  4. 那李逵是穿山度岭 水浒
  5. woo 图像合成,比python简单多了,一个文件到处运行,不用编译
  6. asp毕业设计——基于asp+access的网上远程教育网设计与实现(毕业论文+程序源码)——网上远程教育网
  7. 辅修计算机的机械专业大二同学的跨考准备
  8. php生成PDF文件
  9. Qt3升级 -Qt论坛问答翻译
  10. 数据骗子无处不在,教你拆穿所谓“万金油”