python爬虫我要个性网，获取头像

python爬虫学习
提前声明：请勿他用，仅限个人学习
运用模块有

import requests
import re
import os

较为常规，适合网络小白。lxml和bs4也是基础。长话短说。

headers={'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 Edg/81.0.416.72'}
link="https://www.woyaogexing.com/touxiang/qinglv/"#编写请求头信息
r=requests.get(link,headers=headers)
r.encoding=r.apparent_encoding
html=r.text
# print(html)

编写请求头，和要获取的网址，link，一般常用url，只是一个简称。个人习惯吧。
然后开始分析这个网站，这次用到的是re

运用正则表达式找到那段文字，

title=re.findall('<div class="h1-title z"><h1>(.*?)</h1><i></i><span>></span></div>',html)
divs=re.compile('<a href="(.*?)" class="img" target="_blank" title=".*?">')
divs=re.findall(divs,html)
# print(divs)

测试一下，开始使用迭代语句，进入我们真正想要爬取的图片地址

for div in divs:links='https://www.woyaogexing.com'+divresp=requests.get(links,headers=headers)resp.encoding=resp.apparent_encodinghtmls=resp.text# print(htmls)

到我们找到之后，links就是我们要找的网址，完善这个网址，然后开始第二次请求
首先用到正则表达式，获取我们的第二次想要爬取的网址

hrefs=re.compile('<a href="(.*?)" class="swipebox">')hrefs=re.findall(hrefs,htmls)ids=re.findall('<h1>(.*?)</h1>',htmls)

同时编辑好，存储的路径，用到os模块，字符里面有‘

 base_path = 'F://我要个性网/%s'%titlefor id in ids:id=re.sub('[/]+','--',id)#字符里面有/影响我们存储，去掉path = os.path.join(base_path, id)  # 创建路径if not os.path.exists(path):os.makedirs(path)

最后一步，获取href，以content形式下载保存

    for href in hrefs:href='https:'+href# print(href)tupian=requests.get(href,headers=headers)with open(str(path)+'/'+href.split('/')[-1]+'.jpeg','wb')as f:f.write(tupian.content)print('正在下载中{}'.format(href.split('/')[-1]))

完美收工。
全代码

import requests
import re
import os
headers={'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 Edg/81.0.416.72'}
link="https://www.woyaogexing.com/touxiang/qinglv/"#编写请求头信息
r=requests.get(link,headers=headers)
r.encoding=r.apparent_encoding
html=r.text
# print(html)
title=re.findall('<div class="h1-title z"><h1>(.*?)</h1><i></i><span>></span></div>',html)
divs=re.compile('<a href="(.*?)" class="img" target="_blank" title=".*?">')
divs=re.findall(divs,html)
# print(divs)
for div in divs:links='https://www.woyaogexing.com'+divresp=requests.get(links,headers=headers)resp.encoding=resp.apparent_encodinghtmls=resp.text# print(htmls)hrefs=re.compile('<a href="(.*?)" class="swipebox">')hrefs=re.findall(hrefs,htmls)ids=re.findall('<h1>(.*?)</h1>',htmls)base_path = 'F://我要个性网/%s'%titlefor id in ids:id=re.sub('[/]+','--',id)path = os.path.join(base_path, id)  # 创建路径if not os.path.exists(path):os.makedirs(path)for href in hrefs:href='https:'+href# print(href)tupian=requests.get(href,headers=headers)with open(str(path)+'/'+href.split('/')[-1]+'.jpeg','wb')as f:f.write(tupian.content)print('正在下载中{}'.format(href.split('/')[-1]))

nice！
安排一波

python爬虫我要个性网，获取头像相关推荐

python爬虫爬猎聘网获取多条职责描述中有Linux需求的招聘信息
python爬虫爬猎聘网获取多条职责描述中有Linux需求的招聘信息下列是我爬虫的作业摘要随着现代化社会的飞速发展,网络上巨大信息量的获取给用户带来了许多的麻烦.由于工作和生活节奏的需求,人们 ...
利用python爬取qq个性网图片
利用python爬取qq个性网图片网站头像布局大同小异,稍改代码即可爬取想要的头像. 不多bb,上代码. import requests from parsel import Selector im ...
python爬虫爬取音乐_利用python爬虫实现爬取网易云音乐热歌榜
利用python爬虫实现爬取网易云音乐热歌榜发布时间:2020-11-09 16:12:28 来源:亿速云阅读:102 作者:Leah 本篇文章给大家分享的是有关利用python爬虫实现爬取网易云 ...
Python爬虫登录大学官网
Python爬虫登录大学官网通过python登录大学官网(当然首先要有账号密码),内容包括:如何使用chrome查看网页信息和网络请求.分析网站通过js加密用户密码的方式.使用python登录网 ...
python爬虫爬取当当网的商品信息
python爬虫爬取当当网的商品信息一.环境搭建二.简介三.当当网网页分析 1.分析网页的url规律 2.解析网页html页面书籍商品html页面解析其他商品html页面解析四.代码实现 ...
用Python爬虫来爬写真网图片
用Python爬虫来爬写真网图片 1.我们先要知道Python爬虫的原理基本的Python爬虫原理很简单,分为三步获取网页源码通过分析源码并通过代码来获取其中想要的内容进行下载或其他操作话不 ...
Python爬虫_某宝网案例
Python爬虫_某宝网案例一.导入第三方库,确定url,定义headers ,伪装爬虫代码 import requests url = 'https://s.taobao.com/search?q ...
[Python爬虫案例]-中国古诗网
[Python爬虫案例]-中国古诗网看懂代码,你需要相关知识爬虫必备知识只是想得到目标的话,直接运行就好了 import requests import re import jsondef pa ...
python爬虫爬取知网
python爬虫爬取知网话不多说,直接上代码! import requests import re import time import xlrd from xlrd import open_wor ...

python爬虫我要个性网，获取头像

python爬虫我要个性网，获取头像相关推荐

最新文章

热门文章