Python爬取图片并保存本地

好久没用requests写爬虫了，因为是国内的网站，展示没有使用代理IP，而且爬取的数据不多。
1、第一步将要爬取的网页链接爬取下来。这句话有些矛盾。
url=‘http://www.supe.com.cn/index.php/Project/index’，就这这个各个行业的链接抓取下来，然后保存到本地，

代码是

import requestsheaders={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
}# 要访问的目标页面
url='http://www.supe.com.cn/index.php/Project/index'
resp = requests.get(url, headers=headers,)
res_html=resp.text
#对网页进行解析，并获取想要的内容，这里是用xpath的路径选择
from scrapy.selector import Selector
select=Selector(resp)
area=select.xpath('/html/body//ul[@class="nav clearfix"]//a/text()').extract() #获取分类的类别 ['地坪案例', '聚脲案例']
area_herf=select.xpath('/html/body//ul[@class="nav clearfix"]//a/@href').extract() # ['/index.php/Project/index/cat_id/14', '/index.php/Project/index/cat_id/15']#这里定义一个保存为csv的函数
import csv
def save_csv(data_list):file = csv.writer(open('link.csv', 'a', newline='', encoding='utf-8'))file.writerow(data_list)#获取各个行业的名称以及链接
hangye=select.xpath('/html/body/div[@class="main"]//div[@class="case bgf"]//li')
for li in hangye:hangye_href=li.xpath('./a/@href').extract_first()hangye_name=li.xpath('./a/text()').extract_first()hangye_href_2=hangye_href+'/p/2'print(hangye_name,hangye_href,hangye_href_2)if hangye_href==None:breaksave_csv([hangye_name,hangye_href,hangye_href_2]) #传入一个列表，按行写入

保存的各个行业的链接如下：

二、通过读取link的csv文件获取每个行业的图片链接和标题文字

此时保存的时候需要你提前在项目的目录先新建一个保存csv文件的文件夹，如图：

代码如下：

# -*- coding: utf-8 -*-
# @Time    : 2020/3/4 9:43
# @Author  : 结尾！！
# @FileName: 抓取工程案例的图片-链接.py
# @Software: PyCharm
import csv
import time
import requests
from scrapy.selector import Selector
def parser_html(hangye,url):headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}# 要访问的目标页面resp = requests.get(url, headers=headers, )print(url,resp.status_code)select = Selector(resp)#获取所有的行业以及行业的链接# hangye = select.xpath('/html/body/div[@class="main"]//div[@class="case bgf"]//li') #调用自身的网页链接是一样的# 解析图片和图片的文字  //*[@id="vid_45"]anli_tu = select.xpath('//*[@id="vid_45"]/ul//li')for tu_li in anli_tu:img = tu_li.xpath('./div[@class="picall"]/img/@src').extract_first()di_biao = tu_li.xpath('./div[@class="picall"]//em/text()').extract_first()jie_shao = tu_li.xpath('./div[@class="picall"]//p/text()').extract_first()save_csv(hangye,[img,di_biao,jie_shao])def save_csv(filename,data_list):#需要提前新建一个保存csv文件的图片file = csv.writer(open(f'./project_case/{filename}.csv', 'a', newline='', encoding='utf-8'))file.writerow(data_list)if __name__ == '__main__':file=open('./link.csv',encoding='utf-8')links=file.readlines()#构造链接http://www.supe.com.cn/index.php/Project/index/cat_id/15for link in links:print(link.split(','))line_list=link.split(',')link_1='http://www.supe.com.cn'+line_list[1]link_2='http://www.supe.com.cn'+line_list[2].strip()parser_html(line_list[0],link_1)time.sleep(2)parser_html(line_list[0],link_2)time.sleep(2)

三、就可以按照保存的图片链接，和内容进行下载图片了


import requests
def request_download(file,list_data):file_name=file[:-4]os.makedirs(f'./project_case/{file_name}/',exist_ok=True) #创建文件夹IMAGE_URL='http://www.supe.com.cn'+list_data[0]print(IMAGE_URL)header={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}r = requests.get(IMAGE_URL,headers=header)with open(f'./project_case/{file_name}/{list_data[1]}.jpg', 'wb') as f:f.write(r.content)import os
if __name__ == '__main__':file_names=os.listdir('./project_case')print(file_names)for file_each in file_names:print(file_each)if file_each.endswith('.csv'):file=open(f'./project_case/{file_each}',encoding='utf-8')links=file.readlines()for line in links:print(line.split(','))data=line.split(',')request_download(file_each,data)# time.sleep(1)

最后效果如下：每个文件夹下都是对应的csv文件中的图片，以及可以查找对应的图片描述

以上就是全部过程，给requests加代理，这个使用阿布云代上的代理案例代码，，这里不做演示了。。

Python爬取图片并保存本地相关推荐

python scrapy框架抓取的图片路径打不开图片_Python使用Scrapy爬虫框架全站爬取图片并保存本地的实现代码...
大家可以在Github上clone全部源码. 基本上按照文档的流程走一遍就基本会用了. Step1: 在开始爬取之前,必须创建一个新的Scrapy项目. 进入打算存储代码的目录中,运行下列命令: sc ...
使用Scrapy爬虫框架简单爬取图片并保存本地(妹子图）
使用Scrapy爬虫框架简单爬取图片并保存本地(妹子图) 初学Scrapy,实现爬取网络图片并保存本地功能一.先看最终效果保存在F:\pics文件夹下二.安装scrapy 1.python的安装 ...
python爬取图片然后保存在文件夹中
python爬取图片然后保存在文件夹中直接上代码: import os import requests import redef getimg(soup,i):print('http:'+ soup ...
python爬取图片并且保存到本地指定文件夹内
python爬取图片先上代码: from urllib.request import Request, urlopen, urlretrieve from fake_useragent import ...
python爬取图片并保存到本地
Python爬取图片(你懂得) requests与Bs4 这两个模块是本文使用的主要模块,requests可以获取连接,bs4全名BeautifulSoup,是编写python爬虫常用库之一,主要用来 ...
Python使用Scrapy爬虫框架全站爬取图片并保存本地(@妹子图@)
大家可以在Github上clone全部源码. Github:https://github.com/williamzxl/Scrapy_CrawlMeiziTu Scrapy官方文档:http://sc ...
python爬取图片并以二进制方式保存到本地
本篇主要介绍python 爬取图片并以二进制形式保存到本地这回爬取的是一个写真图片网站,上面有一个365日365枚照片写真写真图片网站链接明确目标,爬取每一页的20张图片,循环生成没一页的url ...
Python爬取图片、视频以及将数据写入excel的方法小摘要
Python爬取图片.视频以及将数据写入excel的方法小摘要 1.爬取图片 2.爬取视频 3.将获取的数据存入excel 4.备注 1.爬取图片 import requests #导入request ...
Python——爬取图片
大家好,我是@xiaomeng 小孟您好欢迎大家阅读今天的文章----Python爬取图片(爬虫) 最近爬虫挺火的,所以我今天也来一个爬虫! 正文: 首先,我们先下载模块,pip install ...

Python爬取图片并保存本地

Python爬取图片并保存本地相关推荐

最新文章

热门文章