Java爬虫实现图片下载

实现爬虫爬取页面的信息所需要的jar包
我是建立maven工程，pom添加的依赖信息

 - <dependency><groupId>org.apache.httpcomponents</groupId><artifactId>httpclient</artifactId><version>4.5.3</version></dependency>- <dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.10.3</version></dependency>- <dependency><groupId>org.apache.commons</groupId><artifactId>commons-lang3</artifactId><version>3.7</version></dependency>- <dependency><groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.6</version></dependency>

爬虫的代码如下，这里是爬取： http://news.4399.com/gonglue/wzlm/pifu/


import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
//获取这个页面
public class Test {@SuppressWarnings("null")public static String gethtml(String html) throws IOException {// 创建http请求CloseableHttpClient createDefalt = HttpClients.createDefault();HttpGet get = new HttpGet(html);String resulthtml = "";CloseableHttpResponse response = null;try {response = createDefalt.execute(get);if (response.getStatusLine().getStatusCode() == 200) {resulthtml = EntityUtils.toString(response.getEntity(), "gb2312");}} catch (ClientProtocolException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();} finally {if (response == null) {try {response.close();} catch (IOException e) {e.printStackTrace();}}createDefalt.close();}return resulthtml;}

通过Document来解析html页面

//通过Document解析页面public static void parsHtml(String html) throws IOException {Document document = Jsoup.parse(html);Element list = document.getElementById("hreo_list");Elements select = list.select("img");for (Element element : select) {String url = element.attr("lz_src");String fileName = element.attr("alt");downImage(url, fileName);System.out.println(fileName);System.out.println("下载成功");}}

下面是通过io流来对页面的图片的下载

static String file = "D://pifu";//下载的目标路径public static void downImage(String imgurl, String fileName) {//判断目标文件夹是否存在File files = new File(file);if (!files.exists()) {files.mkdirs();}InputStream is;FileOutputStream out;try {URL url = new URL(imgurl);HttpURLConnection connection = (HttpURLConnection) url.openConnection();is = connection.getInputStream();// 创建文件File fileofImg = new File(file + "/" + fileName + ".gpg");out = new FileOutputStream(fileofImg);int i = 0;while ((i = is.read()) != -1) {out.write(i);}is.close();out.close();} catch (MalformedURLException e) {// TODO Auto-generated catch blocke.printStackTrace();} catch (FileNotFoundException e) {// TODO Auto-generated catch blocke.printStackTrace();} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}}

运行代码

 public static void main(String[] args) throws IOException {String html = gethtml("http://news.4399.com/gonglue/wzlm/pifu/");parsHtml(html);}

运行结果

Java爬虫实现图片下载相关推荐

用python画写轮眼_Python爬虫入门-图片下载（写轮眼--Lyon）
Python小白最近入了爬虫的坑,但是一直到前天为止我会的只会简单的爬取网页上的文本信息,比如什么豆瓣上的书评 ,知乎上红人的关注者 --一些很简单的爬虫.就在昨天我无聊闲暇在逛知乎偶然发现Lyon ...
用python爬虫制作图片下载器(超有趣!)
这几天小菌给大家分享的大部分都是关于大数据,linux方面的"干货".有粉丝私聊小菌,希望能分享一些有趣的爬虫小程序.O(∩_∩)O哈哈,是时候露一手了.今天给大家分享的是一个适合 ...
Java爬虫之批量下载LibreStock图片（可输入关键词查询下载）
前言(废话) 公司产品新版本刚刚上线,所以也终于得空休息一下了,有了一点时间.由于之前看到过爬虫,可以把网页上的数据通过代码自动提取出来,觉得挺有意思的,所以也想接触一下,但是网上很多爬虫很多都是基于 ...
python爬虫图片工具安卓版下载_python爬虫之图片下载APP1.0
今天给大家来个好玩一点的,运用python爬取图片到本地,网站为https://www.pexels.com/ 这个网站为外文网,所以搜索图片要用英语,今天要做的就是在python中进行搜索和下载图片 ...
python爬虫获取图片无法打开或已损坏_Python爬虫，图片下载完后是损坏的，怎么解决？...
coding:utf-8 import requests from bs4 import BeautifulSoup import os import sys reload(sys) sys.setd ...
初学爬虫-veer图片下载
导包 import requests import pandas as pd from lxml import etree 目标网站 target='https://www.veer.com/phot ...
java爬虫下载图片到磁盘_java入门爬虫（爬取网页的图片下载到本地磁盘）
java爬虫入门技术我们需要用到http协议从而建立java程序和网页的连接 URL url = new URL("https://www.ivsky.com/tupian/ziranf ...
java爬虫写一个百度图片下载器
文章目录 img_download 1.0 看看效果吧 2.0 了解一下 "图片下载器软件" 目录结构 3.0 如何使用? 4.0 源码剖析 5.0 项目地址 6.0 写在最后的话 ...
Java爬虫之下载全世界国家的国旗图片
介绍本篇博客将继续上一篇博客:Python爬虫之使用Fiddler+Postman+Python的requests模块爬取各国国旗的内容,将用Java来实现这个爬虫,下载全世界国家的国旗图片. ...

Java爬虫实现图片下载

Java爬虫实现图片下载相关推荐

最新文章

热门文章