python根据url下载数据_利用Python如何实现根据URL地址下载并保存文件至对应目录...

利用Python如何实现根据URL地址下载并保存文件至对应目录

发布时间：2020-11-16 14:23:11

来源：亿速云

阅读：58

作者：Leah

这篇文章将为大家详细讲解有关利用Python如何实现根据URL地址下载并保存文件至对应目录，文章内容质量较高，因此小编分享给大家做个参考，希望大家阅读完这篇文章后对相关知识有一定的了解。

引言

在编程中经常会遇到图片等数据集将图片等数据以URL形式存储在txt文档中，为便于后续的分析，需要将其下载下来，并按照文件夹分类存储。本文以Github中Alexander Kim提供的图片分类数据集为例，下载其提供的图片样本并分类保存

Python 3.6.5，Anaconda， VSCode

1. 下载数据集文件

建立项目文件夹，下载上述Github项目中的raw_data文件夹，并保存至项目目录中。

2. 获取样本文件位置

编写get_doc_path.py，根据根目录位置，获取目录及其子目录所有数据集文件

import os

def get_file(root_path, all_files={}):

'''

递归函数，遍历该文档目录和子目录下的所有文件，获取其path

'''

files = os.listdir(root_path)

for file in files:

if not os.path.isdir(root_path + '/' + file): # not a dir

all_files[file] = root_path + '/' + file

else: # is a dir

get_file((root_path+'/'+file), all_files)

return all_files

if __name__ == '__main__':

path = './raw_data'

print(get_file(path))

3. 下载文件

3.1 读取url列表并

for filename, path in paths.items():

print('reading file: {}'.format(filename))

with open(path, 'r') as f:

lines = f.readlines()

url_list = []

for line in lines:

url_list.append(line.strip('\n'))

print(url_list)

3.2 创建文件夹

foldername = "./picture_get_by_url/pic_download/{}".format(filename.split('.')[0])

if not os.path.exists(folder_path):

print("Selected folder not exist, try to create it.")

os.makedirs(folder_path)

3.3 下载图片

def get_pic_by_url(folder_path, lists):

if not os.path.exists(folder_path):

print("Selected folder not exist, try to create it.")

os.makedirs(folder_path)

for url in lists:

print("Try downloading file: {}".format(url))

filename = url.split('/')[-1]

filepath = folder_path + '/' + filename

if os.path.exists(filepath):

print("File have already exist. skip")

else:

try:

urllib.request.urlretrieve(url, filename=filepath)

except Exception as e:

print("Error occurred when downloading file, error message:")

print(e)

4. 完整源码

4.1 get_doc_path.py

import os

def get_file(root_path, all_files={}):

'''

递归函数，遍历该文档目录和子目录下的所有文件，获取其path

'''

files = os.listdir(root_path)

for file in files:

if not os.path.isdir(root_path + '/' + file): # not a dir

all_files[file] = root_path + '/' + file

else: # is a dir

get_file((root_path+'/'+file), all_files)

return all_files

if __name__ == '__main__':

path = './raw_data'

print(get_file(path))

4.2 get_pic.py

import get_doc_path

import os

import urllib.request

def get_pic_by_url(folder_path, lists):

if not os.path.exists(folder_path):

print("Selected folder not exist, try to create it.")

os.makedirs(folder_path)

for url in lists:

print("Try downloading file: {}".format(url))

filename = url.split('/')[-1]

filepath = folder_path + '/' + filename

if os.path.exists(filepath):

print("File have already exist. skip")

else:

try:

urllib.request.urlretrieve(url, filename=filepath)

except Exception as e:

print("Error occurred when downloading file, error message:")

print(e)

if __name__ == "__main__":

root_path = './picture_get_by_url/raw_data'

paths = get_doc_path.get_file(root_path)

print(paths)

for filename, path in paths.items():

print('reading file: {}'.format(filename))

with open(path, 'r') as f:

lines = f.readlines()

url_list = []

for line in lines:

url_list.append(line.strip('\n'))

foldername = "./picture_get_by_url/pic_download/{}".format(filename.split('.')[0])

get_pic_by_url(foldername, url_list)

4.3 运行结果

执行get_pic.py

当程序意外停止或再次执行时，程序会自动跳过文件夹中已下载的文件，继续下载未下载的内容{‘urls_drawings.txt': ‘./picture_get_by_url/raw_data/drawings/urls_drawings.txt', ‘urls_hentai.txt': ‘./picture_get_by_url/raw_data/hentai/urls_hentai.txt', ‘urls_neutral.txt': ‘./picture_get_by_url/raw_data/neutral/urls_neutral.txt', ‘urls_porn.txt': ‘./picture_get_by_url/raw_data/porn/urls_porn.txt', ‘urls_sexy.txt': ‘./picture_get_by_url/raw_data/sexy/urls_sexy.txt'}

reading file: urls_drawings.txt

Try downloading file: http://41.media.tumblr.com/xxxxxx.jpg

Try downloading file: http://ak1.polyvoreimg.com/cgi/img-thing/size/l/tid/xxxxxx.jpg

Error occurred when downloading file, error message:

HTTP Error 502: No data received from server or forwarder

Try downloading file: http://akicocotte.weblike.jp/gaugau/xxxxxx.jpg

Try downloading file: http://animewriter.files.wordpress.com/2009/01/nagisa-xxxxxx-xxxxxx.jpg

Try downloading file: http://cdn.awwni.me/xxxxxx.jpg

关于利用Python如何实现根据URL地址下载并保存文件至对应目录就分享到这里了，希望以上内容可以对大家有一定的帮助，可以学到更多知识。如果觉得文章不错，可以把它分享出去让更多的人看到。

python根据url下载数据_利用Python如何实现根据URL地址下载并保存文件至对应目录...相关推荐

利用python从网页查找数据_利用Python模拟淘宝的搜索过程并对数据进行可视化分析...
数据挖掘入门与实战公众号: datadw 本文讲述如何利用Python模拟淘宝的搜索过程并对搜索结果进行初步的数据可视化分析. 搜索过程的模拟:淘宝的搜索页面有两种形式, 一种形式是, 2019/2 ...
python处理水站的数据_利用Python进行数据分析（一）：数据清洗与准备
b站的小伙伴们大家吼~~ 在b站摸了快四年鱼的菜鸡也想开始做知识分享了,虽然说是分享其实根本目的也是为了督促自己好好学习把QAQ. 从今天开始,我将会在专栏分享我在学习<利用Python进行数据 ...
如何用python批量下载数据_利用python脚本，批量自动下载欧洲中心的气象数据
登录后查看更多精彩内容~ 您需要登录才可以下载或查看,没有帐号?立即注册 x 本帖最后由 leeol 于 2017-10-27 18:22 编辑更新:近日我在下数据时发现EC微调了下载方法,具体 ...
python爬取高德数据_利用Python爬取高德地图数据
准备1.高德开放平台注册账户 https://lbs.amap.com/dev/index 验证手机号码.邮箱后进入开发者后台创建一个应用: 并为该应用添加 Key,服务平台选择 web 服务申请完 ...
python爬取软件数据_利用Python爬取爬取APP上面的数据
前言在我们在爬取手机APP上面的数据的时候,都会借助Fidder来爬取.今天就教大家如何爬取手机APP上面的数据. 环境配置 1.Fidder的安装和配置下载Fidder软件地址:https:// ...
python处理行情数据_利用Python脚本来获取期货行情数据
因为自己最近在学习做期货交易,想要下载期货的行情数据来做分析.有一些交易软件是可以导出数据的,但是导出的过程还是需要很多的手工操作,自己在想能不能通过Python程序来实现呢. 新浪期货数据接口介绍 ...
python爬取电脑本地数据_利用python爬取丁香医生上新型肺炎数据，并下载到本地，附带经纬度信息...
原标题:利用python爬取丁香医生上新型肺炎数据,并下载到本地,附带经纬度信息新型肺炎肆虐全国,可以预知,最近一两年地理学中会有一部分论文研究新型肺炎的空间分布及与其他指标的关联分析.获取其患病人 ...
python获取股票逐笔交易数据_利用python下载股票交易数据
前段时间玩Python时无意看到了获取股票交易数据的tushare模块,由于自己对股票交易挺有兴趣,加上现在又在做数据挖掘工作,故想先将股票数据下载到数据库中,以便日后分析: # 导入需要用到的模块 ...
如何用python批量下载数据_使用Python批量下载数据
这次依旧是,不过下载的是Australian Bureau of Statistics的数据,都是xls的表格,网址为:http://www.abs.gov.au.网页打开左边有棵树目录,里面记录的澳 ...

python根据url下载数据_利用Python如何实现根据URL地址下载并保存文件至对应目录...

python根据url下载数据_利用Python如何实现根据URL地址下载并保存文件至对应目录...相关推荐

最新文章

热门文章