使用request库

小下载： 需要一次性写到内存，花费一定空间，然后写入磁盘。

import requests
image_url = "https://www.python.org/static/community_logos/python-logo-master-v3-TM.png"
r = requests.get(image_url)
with open("python_logo.png",'wb') as f:f.write(r.content)

大文件下载:
分块写入到磁盘中，需要的内存固定，但如果块太小的话，程序效率低。

import requests
file_url = "http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf"
r = requests.get(file_url, stream=True)
with open("python.pdf", "wb") as pdf:for chunk in r.iter_content(chunk_size=1024):if chunk:pdf.write(chunk)

**文件批量下载：**通过解析器解析出需要的元素，然后过滤文件名。

import requests
from bs4 import BeautifulSoup
archive_url = "http://www-personal.umich.edu/~csev/books/py4inf/media/"def get_video_links():r = requests.get(archive_url)soup = BeautifulSoup(r.content, 'html5lib')links = soup.findAll('a')video_links = [archive_url + link['href'] for link in links if link['href'].endswith('mp4')]return video_linksdef download_video_series(video_links):for link in video_links:file_name = link.split('/')[-1]print("Downloading file:%s" % file_name)r = requests.get(link, stream=True)# download startedwith open(file_name, 'wb') as f:for chunk in r.iter_content(chunk_size=1024 * 1024):if chunk:f.write(chunk)print("%s downloaded!\n" % file_name)print("All videos downloaded!")return

if name == “main”:
video_links = get_video_links()
download_video_series(video_links)

一个错误的例子：
每次1字节1字节的写，浪费了大量时间。

实现代码

#-*- coding: UTF-8 -*-
import requests
from contextlib import closingclass ProgressBar(object):  def __init__(self, title, count=0.0, run_status=None, fin_status=None, total=100.0, unit='', sep='/', chunk_size=1.0):  super(ProgressBar, self).__init__()  self.info = "[%s] %s %.2f %s %s %.2f %s"  self.title = title  self.total = total  self.count = count  self.chunk_size = chunk_size  self.status = run_status or ""  self.fin_status = fin_status or " " * len(self.status)  self.unit = unit  self.seq = sep  def __get_info(self):  #[名称] 状态 进度 单位 分割线 总数 单位  _info = self.info % (self.title, self.status, self.count/self.chunk_size, self.unit, self.seq, self.total/self.chunk_size, self.unit)  return _info  def refresh(self, count = 1, status = None):  self.count += count  self.status = status or self.status  end_str = "\r"  if self.count >= self.total:  end_str = '\n'  self.status = status or self.fin_status  print(self.__get_info(), end=end_str, )  if __name__ == '__main__':#url = 'http://www.demongan.com/source/game/二十四点.zip'#filename = '二十四点.zip'url  = input('请输入需要下载的文件链接:\n')filename = url.split('/')[-1]# 处理一个responsewith closing(requests.get(url, stream=True)) as response:  chunk_size = 1024  # 块大小1024content_size = int(response.headers['content-length'])   # 获取长度if response.status_code == 200:print('文件大小:%0.2f KB' % (content_size / chunk_size))progress = ProgressBar("%s下载进度" % filename, total = content_size  , unit = "KB"  , chunk_size = chunk_size  , run_status = "正在下载"  , fin_status = "下载完成")  with open(filename, "wb") as file:  for data in response.iter_content(chunk_size=chunk_size):  file.write(data)  progress.refresh(count=len(data))  else:print('链接异常')

python爬虫小工具——下载助手相关推荐

用Python编写小工具下载OSM路网数据
文章来源于Python大数据分析,作者费弗里本文对应脚本已上传至Github仓库: https://github.com/CNFeffery/DataScienceStudyNotes[1] 1 简 ...
装X利器：做一个Python爬虫小工具——图片下载器
一.项目描述前言: 这是一个非常简单的网络爬虫,非常适合初学者了解Python连接网络的初级操作: 平时,如果我们要在网络上下载图片,大家的做法通常是右键,然后图片另存为,但是我们是程序员,我们当然 ...
5款实用Python爬虫小工具推荐（云爬虫+采集器）
目前市面上我们常见的爬虫软件大致可以划分为两大类:云爬虫和采集器(特别说明:自己开发的爬虫工具和爬虫框架除外) 云爬虫就是无需下载安装软件,直接在网页上创建爬虫并在网站服务器运行,享用网站提供的带宽和 ...
python爬虫小工具——editplus
一.下载官网下载地址:https://www.editplus.com/ 根据自己电脑配置,选择32位安装还是64位安装, 还应注意自己的系统,如果是win7的不要下载最新版本,应找之前的版本二. ...
Python 爬虫的工具列表( 附Github代码下载链接)
Python 爬虫的工具列表( 附Github代码下载链接) 这个列表包含与网页抓取和数据处理的Python库网络通用 urllib -网络库(stdlib). requests -网络库. gr ...
[AI创造营]Wechaty实用小工具---证件照助手
[AI创造营]Wechaty实用小工具-证件照助手项目地址: https://aistudio.baidu.com/aistudio/projectdetail/2253862 你是否苦恼于没有条件 ...
python爬虫教程下载-Python爬虫视频教程全集下载
原标题:Python爬虫视频教程全集下载 Python作为一门高级编程语言,在编程中应用得非常广泛.随着人工智能的发展,python人才的需求更大.当然,这也吸引了很多同学选择自学Python爬虫.P ...
python爬虫实现批量下载百度图片
今天和小伙伴们合作一个小项目,需要用到景点图片作为数据源,在百度上搜索了一些图片,感觉一个一个手动保存太过麻烦,于是想到用爬虫来下载图片. 本次代码用到了下列一些python模块,需要预先安装Beau ...
python 处理数据小工具_用Python这个小工具，一次性把论文作图与数据处理全部搞定！...
原标题:用Python这个小工具,一次性把论文作图与数据处理全部搞定! 一入科研深似海-- 随着大学纷纷开学,"防脱发用生姜还是黑芝麻?", 研究僧们又开始为自己所剩无几的头发发愁 ...

python爬虫小工具——下载助手

使用request库

实现代码

python爬虫小工具——下载助手相关推荐

最新文章

热门文章