Jsoup学习 JAVA爬虫爬取美女网站 JAVA爬虫爬取美图网站爬虫

最近对爬虫起了兴趣，但是网上都说做爬虫最好得语言是py。但是我只会java，所以就想能不能用java实现一个爬虫，百度搜索发现，其实java也有很多优秀得开源爬虫框架，包括Gecco，webmagic，Jsoup等等非常多得优秀开源框架，可以让我们在不是十分熟悉正则表达式得情况下也能实现爬虫爬取数据。

本案例使用Jsoup解析网页。使用Jsoup可以很方便的使用类似Jquery得选择器语法来选择html中得元素以提取数据。十分方便

这里有Jsoup得使用简介:http://www.open-open.com/jsoup/

本爬虫得目标网站是https://www.4493.com得美图板块。

先上成果,质量还是很高得，由于硬盘容量有限，我只爬了十页数据。就已经一个多GB了。

接下来是代码：

package .crawl.mote;import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;/*** 爬取4493美图图片使用jsoup和使用正则表达式各一则* @author jiuwei**/
public class main4493 {public static final String URL = "https://www.4493.com";/*** 性感美女*/public static String XGMN = "https://www.4493.com/xingganmote/";public static int xgmnPageCount = 10;public static final String XGMN_DIR = "性感美女";/*** 丝袜美腿*/public static String SWMT = "https://www.4493.com/siwameitui/";public static int swmtPageCount = 0;public static final String SWMT_DIR = "丝袜美腿";/*** 唯美写真*/public static String WMXZ = "https://www.4493.com/weimeixiezhen/";public static int wmxzPageCount = 0;public static final String WMXZ_DIR = "唯美写真";/*** 网络美女*/public static String WLMN = "https://www.4493.com/wangluomeinv/";public static int wlmnPageCount = 0;public static final String WLMN_DIR = "网络美女";/*** 高清美女*/public static String GQMN = "https://www.4493.com/gaoqingmeinv/";public static int gqmnPageCount = 0;public static final String GQMN_DIR = "高清美女";/*** 模特美女*/public static String MTMN = "https://www.4493.com/motemeinv/";public static int mtmnPageCount = 0;public static final String MTMN_DIR = "模特美女";/*** 体育美女*/public static String TYMN = "https://www.4493.com/tiyumeinv/";public static int tymnPageCount = 0;public static final String TYMN_DIR = "体育美女";/*** 动漫美女*/public static String DMMN = "https://www.4493.com/dongmanmeinv/";public static int dmmnPageCount = 0;public static final String DMMN_DIR = "动漫美女";public static File DIR = new File( "d:\\4493\\" );public static void main(String[] args) throws Exception {for(int i = 0; i<xgmnPageCount;i++){String url = XGMN;if(i>0){url = XGMN + "index-"+(i+1)+".htm";}List<Model> list = getPage(url);//获取所有得图片页对象downloadJpg(list,XGMN_DIR);}}/*** 取得当前页对象* @param url* @return*/public static List<Model> getPage(String url){/*** 使用jsoup请求页面并分析*/List<Model> pageUrl = null;Document document = getDocument(url);//获取主页面以取得本页面所有得页得urlpageUrl = new ArrayList<Model>();//存放所有得页urlElements ulElement = document.select("ul.clearfix");//类选择器选择ulElements liElement = ulElement.select("li");//类选择器选择li//取得当前页得每一页得对象for(int i = 0; i < liElement.size(); i++){List<String> imgUrlList = new ArrayList<String>();Element e = liElement.get(i);Model model = new Model();model.setTitle(e.select("span").text());String aurl = e.select("a").attr("href");aurl = aurl.substring(0 , aurl.length()-6);model.setUrl(URL+aurl+".htm");model.setGxsj(e.select("b.b1").text());Document document1 = getDocument(model.getUrl());Elements divElement = document1.select("div.picsbox");//取得图片divElements imgElement = divElement.select("img");//从div中取得所有得img标签for(int j = 0;j<imgElement.size();j++){imgUrlList.add(imgElement.get(j).attr("src"));}model.setImgUrl(imgUrlList);model.setZsl(imgUrlList.size());//总数量为但也浏览页面得所有url得sizepageUrl.add(model);}return pageUrl;}public static void downloadJpg(List<Model> list,String dir2){//分别下载图片到硬盘，按照标题分开File file = new File(DIR,dir2);if(!file.exists()){file.mkdirs();//创建多级文件夹}System.out.println( file + ":创建成功" );for(int i =0; i<list.size(); i++){File file1 = new File(file,list.get(i).getTitle());if(!file1.exists()){file1.mkdirs();//创建多级文件夹}System.out.println( file1 + ":创建成功" );List<String> srcList = list.get(i).getImgUrl();for(int j = 0; j<srcList.size(); j++){String src = srcList.get(j);File file2 = new File( file1, (j+1) + ".jpg" );if(file2.exists()){System.out.println(file2 + "已经存在，跳过；");continue;}URL url;try {url = new URL(src);BufferedInputStream biStream = new BufferedInputStream(url.openStream());BufferedOutputStream ouStream = new BufferedOutputStream(new FileOutputStream(file2)); System.out.println( list.get(i).getTitle() + ":" + src + "开始下载..." );byte[] buf = new byte[1024];int len;while((len = biStream.read(buf)) != -1){ouStream.write(buf,0,len);}biStream.close();ouStream.close();System.out.println( list.get(i).getTitle() + "下载完成！");} catch (MalformedURLException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();}}}}public static Document getDocument(String url){Connection connect = Jsoup.connect(url);Document document = null;try {document = connect.timeout(100000).get();return document;} catch (IOException e) {System.out.println("连接超时！！");e.printStackTrace();}return document;}}

详细得代码已经打包，连接在这里，这里只为练习Jsoup，所以功能不是十分得完善。

http://download.csdn.net/download/wangqq335/10106691，欢迎前辈指正！！

Jsoup学习 JAVA爬虫爬取美女网站 JAVA爬虫爬取美图网站爬虫相关推荐

爬取一个美图网站的图片脚本
# -*- coding: utf-8 -*- """ Created on Thu Dec 9 23:31:12 2021@author: davis "&q ...
python画美图_Python爬虫下手，就得从高清美图开始!
写在前面前几天玩游戏时,lol盒子右下角有条广告, 广告大概这个样子咦,小姐姐,还有cosplay,点进去看看. 哇,发现一个好玩的网站,好多漂亮的妹子,页面打开很流畅,点开后有的浏览页面还有好听 ...
Python【爬虫实战】爬取美女壁纸资源
Python[爬虫实战]爬取美女壁纸资源一:首先选取一个网站,这里我们选择了一个壁纸网站二:进入网站,我们可以看到很多图片放在一页里三:按下F12开发者工具,点击Elments查看网页的代码四 ...
Cosplay美图爬取
python爬虫简单爬取最近学习了爬虫,朋友说想看看关于cos的美图,好巧不巧找到了一个好的网站,废话不多说,下面就帮朋友拿图片. 基于基础爬虫目标网站http://www.cosplay8.co ...
python美女源代码_【网站源码】吾赏美图源码，做自己的美女图站点，PHP+Python...
1 源码介绍美图网站千千万,美图自己说了算!本源码由@香谢枫林开发,首页图片做了浏览器窗口自适应,最大化占满PC浏览器和移动浏览器的窗口,并且防止出现滚动条. 源码截图功能介绍首页图片设置了4 ...
python 网站源码_在线浏览美图源码，附带python源码
源码介绍本源码由@香谢枫林开发,首页图片做了浏览器窗口自适应,最大化占满PC浏览器和移动浏览器的窗口,并且防止出现滚动条. 源码截图美图源码1 美图源码2 功能介绍首页图片设置了4个点击功能区 ...
mysql hzpy_在线浏览美图源码+py源码附带爬虫功能
源码介绍美图网站千千万,美图自己说了算!本源码由@香谢枫林开发,首页图片做了浏览器窗口自适应,最大化占满PC浏览器和移动浏览器的窗口,并且防止出现滚动条. 源码截图功能介绍首页图片设置了4个点 ...
深度学习技术在美图个性化推荐的应用实践
导读:美图秀秀社交化的推进过程中,沉淀了海量的优质内容和丰富的用户行为.推荐算法连接内容消费者和生产者,在促进平台的繁荣方面有着非常大的价值 .本次分享探讨美图在内容社区推荐场景下应用深度学习技术提升 ...
深度学习在美图个性化推荐的应用实践
导读:美图秀秀社交化的推进过程中,沉淀了海量的优质内容和丰富的用户行为.推荐算法连接内容消费者和生产者,在促进平台的繁荣方面有着非常大的价值 .本次分享探讨美图在内容社区推荐场景下应用深度学习技术提升 ...

Jsoup学习 JAVA爬虫爬取美女网站 JAVA爬虫爬取美图网站爬虫

接下来是代码：

Jsoup学习 JAVA爬虫爬取美女网站 JAVA爬虫爬取美图网站爬虫相关推荐

最新文章

热门文章

Jsoup学习 JAVA爬虫爬取美女网站 JAVA爬虫爬取美图网站 爬虫

接下来是代码：

Jsoup学习 JAVA爬虫爬取美女网站 JAVA爬虫爬取美图网站 爬虫相关推荐

最新文章

热门文章

Jsoup学习 JAVA爬虫爬取美女网站 JAVA爬虫爬取美图网站爬虫

Jsoup学习 JAVA爬虫爬取美女网站 JAVA爬虫爬取美图网站爬虫相关推荐