一个咸鱼的python_一个咸鱼的Python爬虫之路（三）：爬取网页图片

学完Requests库与Beautifulsoup库我们今天来实战一波，爬取网页图片。依照现在所学只能爬取图片在html页面的而不能爬取由JavaScript生成的图。

所以我找了这个网站

http://www.ivsky.com

网站里面有很多的图集，我们就找你的名字这个图集来爬取

http://www.ivsky.com/bizhi/yourname_v39947/

来看看这个页面的源代码：

可以看到我们想抓取的图片信息在

里面然后图片地址在img里面那么我们这里可以用BeautifulSoup库方法来解析网页并抓取图片信息。

soup =BeautifulSoup(html,'html.parser')

all_img=soup.find_all('img')

for img in all_img:

src=img['src']

url方面我们用requests库去获取：

def getHtmlurl(url): #获取网址

try:

r=requests.get(url)

r.raise_for_status()

r.encoding=r.apparent_encoding

return r.text

except:

return ""

我们要将图片下载下来并存在本地：

try: #创建或判断路径图片是否存在并下载

if not os.path.exists(root):

os.mkdir(root)

if not os.path.exists(path):

r = requests.get(img_url)

with open(path, 'wb') as f:

f.write(r.content)

f.close()

print("文件保存成功")

else:

print("文件已存在")

except:

print("爬取失败")

整个爬虫的框架与思路：

import requests

from bs4 import BeautifulSoup

import os

def getHtmlurl(url): #获取网址

pass

def getpic(html): #获取图片地址并下载

pass

def main(): 主函数

pass

这里给出完整代码

import requests

from bs4 import BeautifulSoup

import os

def getHtmlurl(url): #获取网址

try:

r=requests.get(url)

r.raise_for_status()

r.encoding=r.apparent_encoding

return r.text

except:

return ""

def getpic(html): #获取图片地址并下载

soup =BeautifulSoup(html,'html.parser')

all_img=soup.find_all('img')

for img in all_img:

src=img['src']

img_url=src

print (img_url)

root='D:/pic/'

path = root + img_url.split('/')[-1]

try: #创建或判断路径图片是否存在并下载

if not os.path.exists(root):

os.mkdir(root)

if not os.path.exists(path):

r = requests.get(img_url)

with open(path, 'wb') as f:

f.write(r.content)

f.close()

print("文件保存成功")

else:

print("文件已存在")

except:

print("爬取失败")

def main():

url='http://www.ivsky.com/bizhi/yourname_v39947/'

html=(getHtmlurl(url))

print(getpic(html))

main()

运行代码：

我们可以看到图片都保存在本地了

这就是简单的实战案列，大家可以自己试试。

一个咸鱼的python_一个咸鱼的Python爬虫之路（三）：爬取网页图片相关推荐

python爬虫入门教程：爬取网页图片
在现在这个信息爆炸的时代,要想高效的获取数据,爬虫是非常好用的.而用python做爬虫也十分简单方便,下面通过一个简单的小爬虫程序来看一看写爬虫的基本过程: 准备工作语言:python IDE:py ...
python爬虫实例方法（批量爬取网页信息基础代码）
文章目录前言一.爬虫实例 0.爬取深圳租房信息 1.爬取深圳算法岗位信息 2.爬取猫图片(基于 selenium库模拟人自动点击) 3.爬取小说纳兰无敌并生成词云二.用到的库 1.正则表达式 ...
使用 requests+lxml 库的 Python 爬虫实例（以爬取网页连载小说《撒野》为例）
需求目标介绍使用 requests 库与 lxml 库进行简单的网页数据爬取普通框架与爬虫实例,本文以爬取网页连载小说<撒野>为例~ 当然有很多小说基本都能找到现成的 .txt 或者 . ...
php爬去百度图片,python爬虫：随心所欲地爬取百度图片
文章目录一.前言二.需要导入的库三.实现过程 1.下载链接分析 2.代码分析 3.完整代码四.Blogger's speech 一.前言之前爬取了很多静态网页的内容,包括:小说.图片等等,今天 ...
python爬虫之一(2)：爬取网页小说（圣墟）
强化: 爬取最新的小说圣墟代码: #coding=utf-8 import os import sys reload(sys) sys.setdefaultencoding('utf8') from ...
python爬虫爬取百度图片,python爬虫篇2：爬取百度图片
入门级 import requests import re import os from urllib import error def main(): dirPath = "E:\pyth ...
python爬虫之正则表达式练习——爬取百度图片
1 明确需求,创建环境 ''' 第一步:明确需求,转换图片需求:爬取百度图片中关于森林的图片,并保存网址:https://image.baidu.com/search/index?tn=baidu ...
python爬取图片教程-推荐|Python 爬虫系列教程一爬取批量百度图片
Python 爬虫系列教程一爬取批量百度图片https://blog.csdn.net/qq_40774175/article/details/81273198# -*- coding: utf-8 ...
python爬虫对炒股有没有用_使用python爬虫实现网络股票信息爬取的demo
实例如下所示: import requests from bs4 import BeautifulSoup import traceback import re def getHTMLText(url ...
使用Python爬取网页图片
使用Python爬取网页图片李晓文 21 天前近一段时间在学习如何使用Python进行网络爬虫,越来越觉得Python在处理爬虫问题是非常便捷的,那么接下来我就陆陆续续的将自己学习的爬虫知识分享给 ...

一个咸鱼的python_一个咸鱼的Python爬虫之路（三）：爬取网页图片

一个咸鱼的python_一个咸鱼的Python爬虫之路（三）：爬取网页图片相关推荐

最新文章

热门文章