Python爬取ps笔刷素材--大文件下载

python 爬取Photoshop素材代码，详细思路见注释~~

import requests
import re
import os
import random
import time
from lxml import etree# 获取response信息
def get_text(url):global headersheaders = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36', }try:response = requests.get(url, headers=headers, timeout=30)response.encoding = 'utf-8'return response.textexcept requests.RequestException as err:print(err)return ''# 获取页数
def get_length(url):text = get_text(url)ehtml = etree.HTML(text)length = ehtml.xpath('//*[@id="zan-page"]/ul/li/a/text()')[-2]return int(length)# 处理url，直接得到下载页面链接
def url_modified(string):return string.replace('brushes8.com', 'brushes8.com/xiazaiyemian6').replace('.html', '')# 获得列表页的相关信息
def get_detail_page(url):text = get_text(url)ehtml = etree.HTML(text)titles = ehtml.xpath('//div[@id="containere"]//a/@title')urls = re.findall(r'<a href="(https://brushes8\.com/\d+.html)" title=',text)urls = list(map(url_modified, urls))img_urls = ehtml.xpath('//div[@id="containere"]//a//img/@src')return titles, urls, img_urls# 下载程序
def download_file(title, img_url, file_url, file_path):global countcount += 1directory = 'Photoshop Download\\{}'.format(keyword.capitalize())path = os.path.join(file_path, directory)if not os.path.exists(path):os.makedirs(path)os.chdir(path)# 下载缩略图try:if os.path.exists(title + '.jpg'):print(f'{title} 缩略图已经存在啦')else:with open(title + '.jpg', "wb") as img:img.write(requests.get(img_url, headers=headers).content)print(f'正在下载第{count}个缩略图: {title}')except (requests.RequestException, PermissionError, IOError):pass# 下载素材包try:resp = requests.get(file_url, headers=headers, stream=True) #关键字 streamfile_size = float(resp.headers['content-length'])file_name = os.path.join(path, title + '.7z')if os.path.exists(file_name):if os.path.getsize(file_name) == file_size:print(f'{title} 素材包已经存在啦')else:with open(file_name, 'wb') as file:size = 0print(f'正在下载第{count}个素材包: {title}')# 大文件下载时，需要采用流式下载for chunk in resp.iter_content(chunk_size=512 * 1024):if chunk:file.write(chunk)size += len(chunk)print('\r当前下载进度为{:.1%}'.format(size / file_size), end='')print(f'\n第{count}个素材包： {title}------下载完成')time.sleep(2 * random.random() + 1)print('')except (requests.RequestException, PermissionError, IOError) as err:print(f'{title}------下载失败', err)passdef run(key_word):start_url = f'https://brushes8.com/category/photoshop-brushes/' \f'{key_word}-brushes'length = get_length(start_url)print(f'当前素材一共有{length}页\n')url_template = 'https://brushes8.com/category/photoshop-brushes/' \'{}-brushes/page/{}'urls = [url_template.format(key_word, i) for i in range(1, length + 1)]file_path = os.getcwd()for url in urls:titles, urls, imgs = get_detail_page(url)for title, url, img in zip(titles, urls, imgs):title = re.sub('、|，|。|；|（|）|下载', '', title)text = get_text(url)if text == '':continueehtml = etree.HTML(text)try:file_url = ehtml.xpath('//ul[@class="xzyemul"]/li[1]/a/@href')[0]download_file(title, img, file_url, file_path)except IndexError:continueif __name__ == '__main__':keyword = 'light'# 可进一步将keyword改成列表，从而使用多进程爬取。count = 0run(keyword)

Python爬取ps笔刷素材--大文件下载相关推荐

python爬取合工大、安大、中科大就业信息网宣讲会信息——requests_htmlmongoDB
# -*- coding: utf-8 -*- """ Created on Fri Apr 27 15:12:18 2018 #python 3.6 "&qu ...
python爬取并翻译_教大家用python爬取百度翻译，超简单-Go语言中文社区
一,首先导入urllib里面的request和parse:这里导入parse主要字符转码 from urllib import request,parse 二,爬取别人的网站最好加入伪装,也就是浏览器 ...
python爬取mblock的图片素材
打开mblock: URL: https://ide.makeblock.com/ 开F12,切到角色页面: 点一下添加,看到有很多素材: 切到Network页的XHR找接口: 其中sprites.j ...
使用Python爬取各类ppt模板素材————
基于Python好用的爬虫程序,亲测! python代码实现[完整] 若由于一些原因,通过程序无法获取ppt模板素材的,可以从以下链接中直接下载获取部分行业的ppt模板文件.如果未能找到符合所需要的p ...
python爬取美女_知乎大神用Python爬取高颜值美女（Python爬虫+人脸检测+颜值检测）...
import time import os import re import requests from lxml import etree from aip import AipFace #百度云 ...
Python爬取微信公众号素材库
这是我的之前写的代码,今天发布到博客园上,说不定以后需要用. 开始: #coding:utf-8 import werobot import pymongoclass Gongzhonghao():d ...
python爬取王者荣耀英雄素材图案例
import time import requests import threading from queue import Queue from urllib import parse import ...
【实用工具系列之爬虫】python爬取资讯数据
系列 1.[实用工具系列之爬虫]python实现爬取代理IP(防 '反爬虫') 2.[实用工具系列之爬虫]python爬取资讯数据前言在大数据架构中,数据收集与数据存储占据了极为重要的地位,可以说 ...
小牧用Python 爬取数万条房产数据，揭秘一线城市生存压力有多大
最近各大一二线城市的房租都有上涨,究竟整体上涨到什么程度呢?我们也不得而知,于是乎笔者为了一探究竟,便用 Python 爬取了房某下的深圳租房数据.以下是本次的样本数据: 除去[不限]的数据(因为可能 ...

Python爬取ps笔刷素材--大文件下载

Python爬取ps笔刷素材--大文件下载相关推荐

最新文章

热门文章