个人实验遇见错误集：

一、javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

二、SSL验证过后出现403

前言：参考书籍——网络数据采集技术Java网络爬虫实战（钱洋、姜元春）

1. 网络爬虫及Java基础知识

1.1 集合

1.1.1 List和Set集合

1.1.1 Map和Queue集合

1.2 String类

1.3 日期和时间处理

1.3.1 UNIX时间戳处理

1.4 正则表达式

1.5 关于jar包依赖

1.5.1 细说log4j日志信息

1.5.2 log4j配置实例：（log4j.properties）

1.5.3 log4j配置测试实例：

2.HTTP协议基础与网络抓包

2.1 HTTP协议简介

2.2 URL数据及访问步骤

2.2.1 浏览器获取服务器资源的详细步骤

2.3 报文

2.4 HTTP 请求方法

2.5 HTTP状态码

2.6 HTTP 信息头

2.6.1 通用头

2.6.2 消息头

2.6.3 响应头

2.6.4 实体头

2.6.5 谷歌抓包看信息

3.网页内容获取工具（以下介绍了Jsoup、HttpClient）还有URLConnetion自行了解

3.1 Jsoup的使用

3.1.1Jsoup功能简介

3.1.2 请求URL

3.1.3 设置头信息以及作用

3.1.4 提交请求参数

3.1.5 超时设置

3.1.6 代理服务器的使用

3.1.7 响应转输出流（图片、PDF...的下载）

3.1.8 HTTPS请求认证（SSL）

3.1.9 大文件内容获取问题

3.2 HttpClient 的使用

3.2.1 HttpClient导包

3.2.2 请求URL

3.2.3 EntityUtils类

3.2.4 设置头信息

3.2.5 POST提交表单

3.2.6 超时设置

3.2.7 代理服务器的使用（proxy）

3.2.8 文件下载

3.2.9 HTTPS请求认证（某一些网站需要）

3.2.10 请求重试

3.2.11 多线程执行请求

个人实验遇见错误集：

一、javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

原因：请求访问某些有SSL验证的网站（https），没有实现SSL的验证过滤——即创建信任管理，返回X509证书。

从异常堆栈中可以分析出：是因为Java虚拟机中缺少相关ssl证书导致的，这里的解决方案有两种：1.在浏览器中提取相关证书，然后注入到jvm中。2.通过编码的方式跳过ssl证书验证。这里我们选取编码的方式跳过ssl验证（具体在HTTPS认证部分）

1）https通信过程：

客户端在使用HTTPS方式与Web服务器通信时有以下几个步骤，如图所示。

（1）客户使用https的URL访问Web服务器，要求与Web服务器建立SSL连接。

（2）Web服务器收到客户端请求后，会将网站的证书信息（证书中包含公钥）传送一份给客户端。

（3）客户端的浏览器与Web服务器开始协商SSL连接的安全等级，也就是信息加密的等级。

（4）客户端的浏览器根据双方同意的安全等级，建立会话**，然后利用网站的公钥将会话**加密，并传送给网站。

（5）Web服务器利用自己的私钥解密出会话**。

（6）Web服务器利用会话**加密与客户端之间的通信。

2)java程序的证书信任规则

如上文所述，客户端会从服务端拿到证书信息。调用端（客户端）会有一个证书信任列表，拿到证书信息后，会判断该证书是否可信任。

如果是用浏览器访问https资源，发现证书不可信任，一般会弹框告诉用户，对方的证书不可信任，是否继续之类。Java虚拟机并不直接使用操作系统的keyring，而是有自己的security manager。与操作系统类似，jdk的security manager默认有一堆的根证书信任。如果你的https站点证书是花钱申请的，被这些根证书所信任，那使用java来访问此https站点会非常方便。因此，如果用java访问https资源，发现证书不可信任，则会报文章开头说到的错误。

二、SSL验证过后出现403

原因：因为没有完全达到模仿浏览器访问，被403（拒绝服务）。

解决方案：httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36");

String url = "https://www.creditchina.gov.cn/xinyongfuwu/?navPage=5";
SSLClient sslClient = new SSLClient();   //实例化
//通过SSL认证
HttpClient httpClientSSL = sslClient.initSSLClient("TLS");
//创建请求
HttpGet httpGet = new HttpGet(url);
//关键步骤————设置请求消息头User-Agent
httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36");
//获取结果
HttpResponse httpResponse = null;
try {httpResponse = httpClientSSL.execute(httpGet);
} catch (IOException e) {e.printStackTrace();
}

前言：参考书籍——网络数据采集技术Java网络爬虫实战（钱洋、姜元春）

1. 网络爬虫及Java基础知识

1.1 集合

网络爬虫涉及List、Set、Queue、Map等集合，全都封装于java.util包中。

1.1.1 List和Set集合

集合	特征
List	以线性方式存储，可以存放重复对象。
Set	无特定方式排序，会过滤重复对象。

List:

        //List集合创建List<String> urllist = new ArrayList<String>();//集合元素添加urllist.add("https://www.baidu.com/1");urllist.add("https://www.baidu.com/2");//第一种方式遍历集合for( String url : urllist ){System.out.println(url);}//第二种方式遍历集合for( int i=0; i<urllist.size();i++ ){System.out.println(i+":"+urllist.get(i));}//第三种方式遍历集合Iterator<String> it= urllist.iterator(); //Iterator:迭代器。创建一个urllist的迭代对象。while ( it.hasNext() ){System.out.println(it.next());}

Set:

        //Set集合创建Set<String> set = new HashSet<String>();//集合元素添加set.add("https://www.baidu.com/1");set.add("https://www.baidu.com/2");//第一种方式遍历集合for( String url : set ){System.out.println(url);}

1.1.1 Map和Queue集合

集合	特征
Map	键对象和值对象映射的集合，每个元素都有一个key，value
Queue（队列）	链表结构存储数据，先进先出，只允许在两端操作。

Map：

        //map集合的初始化Map<String, Integer> map = new HashMap<String, Integer>();//值的添加map.put("id1", 100);map.put("id2", 200);map.put("id3", 300);map.put("id4", 400);/***  Map.entrySet迭代器会生成EntryIterator,其返回的实例是一个包含key/value键值对的对象。*  而keySet中迭代器返回的只是key对象，还需要到map中二次取值。故entrySet要比keySet快一倍左右。*///第一种：普遍使用，由于二次取值,效率会比第二种和第三种慢一倍System.out.println("通过Map.keySet遍历key和value：");for (String key : map.keySet()) { //遍历key对象Integer value = map.get(key); //获取key对应的valueSystem.out.println("key=" + key + ",value=" + value);}//第二种方式遍历集合System.out.println("通过Map.entrySet使用iterator遍历key和value：");Iterator<Map.Entry<String, Integer>> it = map.entrySet().iterator(); //Iterator:迭代器。创建一个map的迭代对象。while (it.hasNext()) {Map.Entry<String, Integer> entry = it.next();System.out.println("key=" + entry.getKey() + ",value=" + entry.getValue());}//第三种：(推荐，尤其是容量大时)无法在for循环时实现remove等操作System.out.println("通过Map.entrySet遍历key和value");for (Map.Entry<String, Integer> entry : map.entrySet()) {System.out.println("key=" + entry.getKey() + ",value=" + entry.getValue());}//第四种：只能获取values,不能获取keySystem.out.println("通过Map.values()遍历所有的value，但不能遍历key");for (Integer value : map.values()) {System.out.println("value :" + value);}

Queue：

	抛出异常	返回特殊值
添加元素	add(e)	offer(e)
获取并移除元素	remove()	poll()
获取队头元素但不移除队头元素	element()	peek()

 /*** 队列常用操作，add()、remove()方法在失败的时候会抛出异常（不推荐）。*/Queue<String> urlQueue=new LinkedList<String>();urlQueue.offer("https://www.baidu.com/1");urlQueue.offer("https://www.baidu.com/2");for(String url : urlQueue){ //遍历System.out.println(url);}//获取队头元素，并且删除System.out.println("第一个url为："+urlQueue.poll());for(String url : urlQueue){ //删除后遍历System.out.println(url);}//获取队头元素，不删除System.out.println("第一个url为："+urlQueue.element());for(String url : urlQueue){ //获取后遍历System.out.println(url);}if (urlQueue.isEmpty()){System.out.println("队列为空");} else {System.out.println("队列不空，个数为："+urlQueue.size());}

1.2 String类

方法	返回值类型	描述
length()	int	获取字符串长度
equals(String s)	boolean	判断两个字符串是否相等
concat(String s)	String	连接两个字符串
contains(String s)	boolean	判断当前字符串是否包含 s
substring(int beginIndex)	String	从beginIndex处截取到最后所得到的字符串
substring(int beginIndex,int endIndex)	String	从beginIndex处截取到endIndex所得到的字符串
indexOf(String s)	int	从字符串头位置开始检索字符串s，返回首次出现位置，若没有则返回-1
starsWith(String prefix)	boolean	判断字符串前缀是否为 prefix
starsWith(String prefix,int toffset)	boolean	判断字符串从指定索引开始的字符串前缀是否为prefix，若没有则返回-1
endsWith(String suffix)	boolean	判断字符串是否以指定后缀suffix结束
trim()	String	去除字符串的首位空格
toLowerCase()	String	将字符串中的所有字符都转换为小写
toUpperCase()	String	将字符串中的所有字符都转换为大写

类型转换：

String——>int : Integer.parseInt(string);

String——>double : Double.parseDouble(string);

1.3 日期和时间处理

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
public class TimeTest {public static void main(String[] args) {System.out.println(parseStringTime("2016-05-19 19:17","yyyy-MM-dd HH:mm","yyyy-MM-dd HH:mm:ss"));System.out.println(parseStringTime("2018-06-19","yyyy-MM-dd","yyyy-MM-dd HH:mm:ss"));}/*** 字符型时间格式标准化方法* @param inputTime(输入的字符串时间),inputTimeFormat(输入的格式),outTimeFormat(输出的格式).* @return 转化后的时间(字符串)*/public static String parseStringTime(String inputTime,String inputTimeFormat,String outTimeFormat){String outputDate = null;try {//日期格式化及解析时间Date inputDate = new SimpleDateFormat(inputTimeFormat).parse(inputTime); //转化成新的形式的字符串outputDate = new SimpleDateFormat(outTimeFormat).format(inputDate); } catch (ParseException e) {e.printStackTrace();}return outputDate;}
}

1.3.1 UNIX时间戳处理

 //将unix时间戳转化成指定形式的时间public static String TimeStampToDate(String timestampString, String formats) {Long timestamp = Long.parseLong(timestampString) * 1000;String date = new SimpleDateFormat(formats, Locale.CHINA).format(new Date(timestamp));return date;}

1.4 正则表达式

深度讲解：

https://blog.csdn.net/generalfu/article/details/114933838

过滤展示：

public class ZhengZhe {public static void main(String[] args) {String str="a1b2c3dAZ4"; //需要过滤的样本String strReplace1= str.replaceAll("[abc]","");System.out.println("使用[abc]匹配替换的结果："+strReplace1);//使用[abc]匹配替换的结果：123dAZ4String strReplace2= str.replaceAll("[^abc]","");System.out.println("使用[^abc]匹配替换的结果："+strReplace2);//使用[^abc]匹配替换的结果：abcString strReplace3= str.replaceAll("[a-zA-Z]","");System.out.println("使用[a-zA-Z]匹配替换的结果："+strReplace3);//使用[a-zA-Z]匹配替换的结果：1234String strReplace4= str.replaceAll("[1-9]","");System.out.println("使用[1-9]匹配替换的结果："+strReplace4);//使用[1-9]匹配替换的结果：abcdAZString strReplace5= str.replaceAll("[a-d1-3]","");System.out.println("使用[a-d1-3]匹配替换的结果："+strReplace5);//使用[a-d1-3]匹配替换的结果：AZ4String strReplace6= str.replaceAll("AZ[4]?","");System.out.println("使用AZ[4]匹配替换的结果："+strReplace6);//使用AZ[4]匹配替换的结果：a1b2c3d}
}

1.5 关于jar包依赖

   <!-- mysql连接驱动 --><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.35</version></dependency><!-- HTML解析器 --><dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.13.1</version></dependency><!-- HttpClient只能以编程的方式通过其API用于传输和接受HTTP消息。--><dependency><groupId>org.apache.httpcomponents</groupId><artifactId>httpclient</artifactId><version>4.2.3</version></dependency><!-- 日志信息框架 --><dependency><groupId>log4j</groupId><artifactId>log4j</artifactId><version>1.2.17</version></dependency>

1.5.1 细说log4j日志信息

步骤一：（配置根Logger）

log4j.rootLogger = [level] ,appenderName,appenderName,...

level(日志级别)：DEBUG(调试)<INFO(信息)<WARN(警告)<ERROR(错误)<FATAL(中断级重大错误)

appenderName(日志输出目的地名)：stdout(控制台)、D(调试)、I(信息)...

步骤二：（配置日志的输出目的地信息）

log4j.appender.appenderName = fully.qualified.name.of.appender.class

log4j.appender.appenderName.option1 = value1

log4j.appender.appenderName.optionN = valueN

fully.qualified.name.of.appender.class：

org.apache.log4j.ConsoleAppender(控制台)
org.apache.log4j.FileAppender(文件)
org.apache.log4j.DailyRollingFileAppender(每天产生一个日志文件)
org.apache.log4j.RollingFileAppender(文件大小达到指定尺寸的时候产生一个新文件)
org.apache.log4j.WriterAppender(将日志信息以流格式发送到任意指定的地方)——不能直接配置使用，所以不常用

ConsoleAppender(控制台)的option：

Threshold = DEBUG：指定日志消息的输出最低级别，默认为DEBUG。
ImmediateFlush = true：是否立即输出消息，默认为true。
Target = System.err：使用System.err在控制台输出，默认情况是System.out。

FileAppender(文件)的option：

Threshold = DEBUG：指定日志消息的输出最低级别，默认为DEBUG。
ImmediateFlush = true：是否立即输出消息，默认为true。
File = E:\logs.txt：指定消息输出到E盘中的logs.txt文件。
Append = false：默认值是true——将消息追加到指定文件中，false——将消息覆盖到指定文件中。

DailyRollingFileAppender(每天产生一个日志文件)的option：

Threshold = DEBUG：指定日志消息的输出最低级别，默认为DEBUG。
ImmediateFlush = true：是否立即输出消息，默认为true。
File = E:\logs.txt：指定消息输出到E盘中的logs.txt文件。
Append = false：默认值是true——将消息追加到指定文件中，false——将消息覆盖到指定文件中。
DatePattern = '.'yyyy-ww：每周滚动一次文件，即每周产生一个新的文件。（可以指定按月、周、天、时、分滚动文件。）

RollingFileAppender(文件大小达到指定尺寸的时候产生一个新文件)的option：

Threshold = DEBUG：指定日志消息的输出最低级别，默认为DEBUG。
ImmediateFlush = true：是否立即输出消息，默认为true。
File = E:\logs.txt：指定消息输出到E盘中的logs.txt文件。
Append = false：默认值是true——将消息追加到指定文件中，false——将消息覆盖到指定文件中。
MaxFileSize = 100KB：后缀可以是KB、MB、GB。在日志文件达到设定值时，产生新的文件。
MaxBackupIndex = 指定可以产生的滚动文件最大数。

步骤三：（配置日志信息布局layout）

log4j.appender.appenderName.layout = fully.qualified.name.of.layout.class

log4j.appender.appenderName.layout .option1= value1

log4j.appender.appenderName.layout .optionN= valueN

fully.qualified.name.of.layout.class：

org.apache.log4j.HTMLLayout(HTML表格形式布局)
org.apache.log4j.PatternLayout(灵活的指定布局信息)
org.apache.log4j.SimpleLayout(包含日志信息的级别和信息字符串)
org.apache.log4j.TTCCLayout(包含日志产生的时间、线程、类别等信息)

HTMLLayout(HTML表格形式布局)的option：

LocationInfo = true：默认值 false，输出java文件名称和行号。
Title = Test_WARN：默认值是Log4JLogMessages，这里为Test_WARN。

PatternLayout(灵活的指定布局信息)的option：

ConversionPattern = %m%n：以指定信息格式输出，格式如下：

%p：输出优先级，即DEBUG、INFO...。
%r：输出应用启动到输出该log信息耗费的毫秒数。
%c：输出所属的类目，通常是所在类的全名。
%n：输出一个回车换行符。
%d：输出日志时间点的日期或时间，默认格式 ISO8601 ，也可以在其后指定格式（%d{yyyy-MM-dd HH\:mm\:ss,SSS}）。
%l：输出日志信息的发生位置，包括类目名、发生的线程，以及在代码中的行数。
%t：输出产生改日志事件的线程名。

1.5.2 log4j配置实例：（log4j.properties）

### 配置根 Logger ###
log4j.rootLogger = debug,stdout,D,E,W
### 输出信息到控制台 ###
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern =%p[%d{yyyy-MM-dd HH\:mm\:ss,SSS}] [%t] %C.%M(%L) | %m%n
### 输出DEBUG级别以上的日志到 E：//logs/log.log ###
log4j.appender.D = org.apache.log4j.DailyRollingFileAppender
log4j.appender.D.File = E:/logs/log.log
log4j.appender.D.Append = true
log4j.appender.D.Threshold = DEBUG
log4j.appender.D.layout = org.apache.log4j.PatternLayout
log4j.appender.D.layout.ConversionPattern = %p[%d{yyyy-MM-dd HH\:mm\:ss,SSS}] [%t] %C.%M(%L) | %m%n
### ERROR E：//logs/error.log ###
log4j.appender.E = org.apache.log4j.DailyRollingFileAppender
log4j.appender.E.File = E:/logs/error.log
log4j.appender.E.Append = true
log4j.appender.E.Threshold = ERROR
log4j.appender.E.layout = org.apache.log4j.HTMLLayout
log4j.appender.E.layout.LocationInfo =true
log4j.appender.E.layout.Title = Test_ERROR
### 输出INFO级别以上的日志到 E：//logs/warn.log ###
log4j.appender.W = org.apache.log4j.RollingFileAppender
log4j.appender.W.File = E:/logs/warn.log
log4j.appender.W.Append = true
log4j.appender.W.Threshold = WARN
log4j.appender.W.MaxFileSize = 2KB
log4j.appender.W.layout = org.apache.log4j.HTMLLayout
log4j.appender.W.layout.LocationInfo = true
log4j.appender.W.layout.Title = Test_WARN

1.5.3 log4j配置测试实例：


import org.apache.log4j.Logger;public class Log4jTest {
static final Logger logger=Logger.getLogger(Log4jTest.class);public static void main(String[] args) {System.out.println("hello"); //控制台输出//日志信息logger.info("hello world");logger.debug("this is debug msg");logger.warn("this is warn msg");logger.error("this is error msg");}
}

2.HTTP协议基础与网络抓包

传输大致

图2.1 基于HTTP协议的数据传输

2.1 HTTP协议简介

传输的部分数据类型：

text/html：HTML格式的文本文档。
image/jpeg：JPEG格式的图片。
image/png：PNG格式的图片。
image/webp：无损格式的图片。
image/gif：GIF格式的图片。
text/plain：普通的ASCII文本文档。
application/json：JSON格式的内容。
video/mp4：MP4格式的视频。
video/quicktime：Apple的QuickTime视频（MOV格式的视频）。
video/x-msvedeo：AVI格式的视频。
video/x-flv：FLV格式的视频。

HTTP（超文本传输协议）：数据在网络中传输需要依赖于TCP/IP协议。

TCP（传输控制协议）：用于保证数据在两台主机之间传输的可靠性。TCP实行（顺序控制）——数据会按照发送的顺序到达。（重发控制）——若接收超时，则重新发送数据包。

IP（网际协议）：负责将数据包从源发送到目的计算机（不可靠）。

图2.2 TCP/IP协议的四个层次

2.2 URL数据及访问步骤

其中端口默认80省略，位于域名后面例如：localhost:8081/login

2.2.1 浏览器获取服务器资源的详细步骤

浏览器从输入的URL中解析出服务器的域名和端口号（如果没有端口号，默认为80）。
浏览器将服务器的域名转化为服务器的IP地址。
基于服务器的IP地址及端口号，建立浏览器与服务器的TCP连接。
浏览器向服务器发送HTTP请求报文。
基于浏览器请求内容，服务器向浏览器返回相应的HTTP响应报文。
浏览器获取响应报文并解析。
关闭连接。

2.3 报文

报文：分为请求报文和响应报文。其中，请求报文包括请求方法、请求的URL、版本协议以及请求头信息。响应报文包括请求协议、响应状态码、响应头信息和响应内容。

2.4 HTTP 请求方法

在客户端向服务器发送请求时，需要确定使用的请求方法（也称为动作）。请求方法表明了对URL指定资源的操作方式，服务器会根据不同的请求方法做不同的响应。网络爬虫中常用的两种请求方法为GET和POST。

GET：发送请求获取服务器上某特定资源。GET常见，安全性低，大多爬虫常用。
POST：向服务器提交数据，请求服务器进行处理。常用表单提交
HEAD：与GET类似，只会从服务器获取资源的头信息，不能获取响应内容。
PUT：使用客户端向服务器传送的数据取代指定内容。
DELETE：请求分服务器删除指定资源。
CONNECT：在客户端配置代理的情况下，使用CONNECT建立客户端与服务器之间的联系。
OPTIONS：询问服务器支持的请求方法，允许客户端查看服务器的性能。
TRACE：对可能经过代理服务器传送到服务器上的报文进行追踪。

2.5 HTTP状态码

2.6 HTTP 信息头

HTTP信息头，也称头字段或首部，是构成HTTP报文的要素之一，起到传递额外重要信息的作用。

在网络爬虫中，我们常使用多个User-Agent和多个referer等请求头来模拟人的行为，进而绕过一些网站的防爬措施。

2.6.1 通用头

字段名	功能
Cache-Control	请求和响应遵循的缓存机制
Connection	客户端和服务器指定与请求或响应链接有关的选项，例如是否需要持久连接
Date	创建HTTP报文的时间，即信息发送时间
Pragma	包含用来实现特定的指令，通常用Pragma:no-cache
Trailer	表明以chunked编码传输的报文实体数据尾部存在的字段。
Transfer-Encoding	规定了传输报文实体数据采用的编码方法。
Upgrade	检测HTTP协议，允许服务器指定一种新的协议
Via	追踪客户端与服务器之间的请求报文和响应报文的传输路径（网管、代理服务器等）
Warning	告知用户与缓存相关的警告

对以上各个字段解析：

（1）Cache-Control：

表2.1 Cache-Control 请求指令

指令	说明
no-cache	告知服务器不直接使用缓存，目的是放置从缓存中返回过期的资源
no-store	提示请求或响应中包含机密信息，规定不缓存请求或响应中的任何内容
max-age	客户端希望接收存在时间不超过规定秒数的资源
max-stale	客户端希望接收存在时间超过规定秒数的资源
max-fresh	客户端希望接收还未超过指定秒数的缓冲资源
only-if-cached	客户端仅在服务器本地已缓存目标资源的情况下，要求服务器返回资源

表2.2 Cache-Control 响应指令

指令	说明
public	可以向任一方提供响应缓存
private	仅向特定用户提供响应缓存
no-cache	不能直接使用缓存，要向服务器发起验证
no-store	提示请求或响应中包含机密信息，规定不缓存或请求
no-transform	不得对资源进行转换或转变
max-age	告知客户端，资源在规定的秒数内是最新的，无需向服务器发送新请求
must-revalidate	可缓存但必须有服务器发出验证请求，请求失败返回504
proxy-revalidate	要求中间缓存服务器（如代理）对缓存的响应有效性再进行确认

下面为请求指令的案例：

Cache-Control：no-cache
Cache-Control：no-store
Cache-Control：max-age=0
Cache-Control：max-stale=3000
Cache-Control：min-fresh=60
Cache-Control：only-if-cached

下面为响应指令的一个案例：

Cache-Control：must-revalidate
Cache-Control：no-cache=Location
Cache-Control：no-store
Cache-Control：no-transform
Cache-Control：public
Cache-Control：private
Cache-Control：proxy-revalidate
//可以使用多值——Cache-Control：private，s-maxage=0，max-age=0，must-revalidate

（2）Connection：

Connection：Upgrade //检测协议是否可以使用更高版本连接控制，不再转发给代理头字段
Connection：keep-alive //保持网络连接（HTTP/1.1之前版本默认关闭持久连接）
Connection：close //断开网络连接（HTTP/1.1版本默认打开持久连接）

（3）Date：用于创建HTTP报文的时间

Date：Tue，09 Oct 2018 00:09:08 GMT

（4）Pragma：HTTP/1.1之前版本中的通用头

Pragma：no-cache //与HTTP/1.1协议中的Cache-Control：no-cache 效果相同

（5）Trailer：前一个响应头，允许服务器在发送的报文主体后面添加额外内容。

HTTP/1.1 200 OK
Content-Type:text/html
Transfer-Encoding:chunked
Trailer:Expires //报文主体后追加的字段名
//HTML内容
Expires：Tue，09 Oct 2018 00:09:08 GMT //追加的内容

（6）Transfer-Encoding：数据压缩算法

Transfer-Encoding：chunked,compress,deflate,gzip,identity //多值使用

（7）Upgrade：向服务器指定某种传输协议，以便服务器进行转换（使用时需要在Connection：Upgrade）

Upgrade： HTTP/2.0，SHTTP/1.3，IRC/6.9，RTA/x11

（8）Warning：

Warning：112 - "cache down" "Tue， 09 Oct 2018 00：09：08 GMT "

2.6.2 消息头

字段名	功能
Accept	指定客户端可以处理的数据类型
Accept-Charset	指定客户端可以接收的字符集
Accept-Encoding	指定浏览器能够进行解码的数据编码格式
Cookie	客户端发送请求时，将保存在该请求域名下所有的cookie值一起发送给服务器
Host	指定请求的服务器的域名和端口号，不包括协议
Origin	指定请求的服务器名称，即包括协议和域名
Referer	告知服务器请求的原始资源的URL，即包括协议、域名、端口等信息。
Upgrade-Insecure-Requests	向服务器发送一个信号，表示客户对加密和认证响应的偏好
User-Agent	发起请求的应用程序名
Accept-Language	指定浏览器可接收的语言种类

2.6.3 响应头

字段名	功能
Accept-Ranges	指定服务器对资源请求的可接受范围类型，字段的值定义了范围类型的单位
Age	服务器产生响应经过的时间，单位秒，非负整数，主要用于缓存
Set-Cookie	用来由服务器端向客户端发送cookie
Server	指明服务器软件以及版本号
Vary	告知代理是使用缓存响应还是从源服务器中重新请求资源

2.6.4 实体头

字段名	功能
Allow	列出资源所支持的HTTP方法集合
Content-Encoding	告知客户端，服务器对实体数据的编码方式
Content-Language	告知客户端，实体数据使用的语言类型
Content-Length	实体数据的长度
Content-Location	实体数据的资源位置
Content-Range	当前传输的实体数据在整个资源中的字节范围
Content-Type	实体数据的类型
Expires	实体数据的有效期
Last-Modified	实体数据上次被修改的日期和时间

2.6.5 谷歌抓包看信息

1.F12打开开发者选项

2. 刷新后出现数据包

3. 随意点击一个出现对应的数据

3.网页内容获取工具（以下介绍了Jsoup、HttpClient）还有URLConnetion自行了解

3.1 Jsoup的使用

3.1.1Jsoup功能简介

Jsoup：基于Java语言的开源项目，用于请求URL获取网页内容、解析HTML和XML文档。

3.1.2 请求URL

图片说明：

3-1：网站建立连接，获取HTML内容

import java.io.IOException;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class JsoupConnectURL1 {public static void main(String[] args) throws IOException {//创建连接---注意这里是HTTP协议——因此需要无调用validateTLSCertificates()方法//警示：此处validateTLSCertificates()方法在jsoup版本为1.12后被删除，现在连接HTTPS时，无需多写东西去验证SSL。Connection connect = Jsoup.connect("http://i.chaoxing.com/");//请求网页，获取网页document对象Document document = connect.get();//输出HTML内容System.out.println(document.html());}
}

3-2 先获取响应Response，再获取HTML内容

package com.xp.climb.climb03.jsoup;import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;
import java.net.URL;public class JsoupConnectURL2 {public static void main(String[] args) throws IOException {//获取响应Connection.Response response = Jsoup.connect("http://i.chaoxing.com/").method(Connection.Method.GET).execute();URL url = response.url(); //查看请求的URLSystem.out.println("请求的URL为："+url);int statusCode = response.statusCode(); //获取请求的状态码System.out.println("响应的状态码为："+statusCode);String contentType= response.contentType(); //获取响应数据类型System.out.println("响应类型为："+contentType);String statusMessage= response.statusMessage();//获取响应信息System.out.println("响应信息为："+statusMessage);//判断状态码 200if (statusCode == 200){//通过这种方式可以获得响应的HTML文件String html = new String(response.bodyAsBytes(),"utf-8");//获取HTMl内容，但对应的是Document类型Document document = response.parse();//此处HTML和Document数据是一致，但Document经过格式化。System.out.println("HTML:"+html);System.out.println("Document:"+document);}}
}

3.1.3 设置头信息以及作用

设置头信息的作用：伪装网络爬虫，使得网络爬虫请求网页更像浏览器访问网页，进而降低网络爬虫被网站封锁的风险。

3.3 设置单个请求头


import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;
public class JsoupConnectHeader {public static void main(String[] args) throws IOException {Connection connect = Jsoup.connect("http://i.chaoxing.com/");//设置一个请求头Connection conheader = connect.header("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36");Document document = conheader.get();System.out.println(document);}
}

3.4 设置多个请求头和Referer

import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;import java.io.IOException;
import java.util.*;public class JsoupConnectHeaderList {public static void main(String[] args) throws IOException {Connection connect = Jsoup.connect("http://i.chaoxing.com/");//实例化静态类Builder builder = new Builder();//请求网页添加不同Host，也可以不设置builder.host="i.chaoxing.com";//将Buider中的信息添加到Map集合中Map<String,String> header = new HashMap<String,String>();header.put("Host",builder.host);header.put("User-Agent",builder.userAgentList.get(new Random().nextInt(builder.userAgentSize)));header.put("Accept",builder.accept);header.put("Referer",builder.refererList.get(new Random().nextInt(builder.refererSize)));header.put("Accept-Language",builder.acceptLanguage);header.put("Accept-Encoding",builder.acceptEncoding);//设置头Connection conheader = connect.headers(header);Document document = conheader.get();System.out.println(document);}/*** 封装请求头信息的静态类*/static class Builder{//设置 User—Agent库，根据需求添加更多User—AgentString[] userAgentStrs = {"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"};List<String> userAgentList = Arrays.asList(userAgentStrs);int userAgentSize = userAgentList.size();//设置Referer库，根据需求添加更多RefererString[] refererStrs = {"http://mooc1-1.chaoxing.com/"};List<String> refererList = Arrays.asList(refererStrs);int refererSize = refererList.size();//设置accept、accept-Language及accpet-EncodingString accept = "text/html,application/xhtml+xml,application/xml;q=0.9," +"image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";String acceptLanguage ="gzip, deflate";String acceptEncoding ="zh-CN,zh;q=0.9";String host;}
}

3.1.4 提交请求参数

GET：通过URL传递，通常以 "?key1=value1&key2=value2"

POST：通常放在POST请求消息体中，格式一般为JSON。

Jsoup中提供的方法有：

Connection data(String key, String value)
Connection data(String... keyVals)
Connection data(Map<String, String> data)
Connection data(String key, String filename,InputStream inputStream,String contentType)
Connection data(Collection<KeyVal> data)

对前三种常用方法进行举例：

package com.xp.climb.climb03.jsoup;import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;import java.io.IOException;
import java.util.HashMap;
import java.util.Map;public class JsoupConnectData {public static void main(String[] args) throws IOException {Connection connect = Jsoup.connect("http://localhost:8084/findById");//添加参数 第一种 Connection data(String key, String value)//connect.data("id","1").data("action","ajax");//添加参数 第二种 Connection data(String... keyVals)//connect.data("id","1","action","ajax");//添加参数 第三种 Connection data(Map<String, String> data)Map<String ,String> data = new HashMap<String ,String >();data.put("id","1");data.put("action","ajax");connect.data(data);Connection.Response response = connect.method(Connection.Method.GET).ignoreContentType(true).execute();//获取数据，转换成HTML格式Document document = response.parse();System.out.println(document);}
}

3.1.5 超时设置

Jsoup在请求URL时，可以自定义毫秒级超时时间。（默认为30s）

Connection.Response response = Jsoup.connect("http://i.chaoxing.com/").method(Connection.Method.GET).timeout(3*1000).execute();Document document = Jsoup.connect("http://i.chaoxing.com/").timeout(10*1000).get();

3.1.6 代理服务器的使用

代理服务器：客户端和服务器的中介。客户端发送请求到代理服务器代理服务器再去服务器取回浏览器所需要信息。

以下介绍两种方法设置代理服务器：

package com.xp.climb.climb03.jsoup;import org.jsoup.Connection;
import org.jsoup.Jsoup;import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.Proxy;public class JsoupConnectProxy1 {public static void main(String[] args) throws IOException {//使用第一种方式设置代理/*Proxy proxy = new Proxy(Proxy.Type.HTTP,new InetSocketAddress("171.221.239.11",808));Connection connect = Jsoup.connect("http://i.chaoxing.com/").proxy(proxy);*///使用第二种方式设置代理Connection connect = Jsoup.connect("http://i.chaoxing.com/").proxy("171.221.239.11",808);Connection.Response response = Jsoup.connect("http://i.chaoxing.com/").method(Connection.Method.GET).timeout(3*1000).execute();int statusCode = response.statusCode(); //获取请求的状态码System.out.println("响应的状态码为："+statusCode);}
}

3.1.7 响应转输出流（图片、PDF...的下载）

使用Jsoup下载图片、PDF和压缩等文件，需要将响应转化成输出流。

目的：增强写文件的能力（字节为单位写入文件）

package com.xp.climb.climb03.jsoup;import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.Connection.Method;
import org.jsoup.Connection.Response;
//实现图片的下载
public class JsoupConnectInputstream {public static void main(String[] args) throws IOException {String imageUrl = "https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fimg.zcool.cn%2Fcommunity%2F01a2485545680f0000019ae9da087c.jpg%401280w_1l_2o_100sh.jpg&refer=http%3A%2F%2Fimg.zcool.cn&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1623163517&t=865ec2d5d0a57412aaa80ae697a43bb5";Connection connect = Jsoup.connect(imageUrl);Response response = connect.method(Method.GET).ignoreContentType(true).execute();System.out.println("文件类型为:" + response.contentType());//如果响应成功，则执行下面的操作if (response.statusCode() ==200) {//响应转化成输出流BufferedInputStream bufferedInputStream = response.bodyStream();//保存图片saveImage(bufferedInputStream,"D:\\IdeaWork\\爬虫\\src\\main\\java\\com\\xp\\climb\\climb03\\image\\1.png");}}/*** 保存图片操作* @param  输入流* @param  保存的文件目录* @throws IOException*/static void saveImage(BufferedInputStream inputStream, String savePath) throws IOException  {byte[] buffer = new byte[1024];int len = 0;//创建缓冲流FileOutputStream fileOutStream = new FileOutputStream(new File(savePath));BufferedOutputStream bufferedOut = new BufferedOutputStream(fileOutStream);//图片写入while ((len = inputStream.read(buffer, 0, 1024)) != -1) {bufferedOut.write(buffer, 0, len);}//缓冲流释放与关闭bufferedOut.flush();bufferedOut.close();}
}

3.1.8 HTTPS请求认证（SSL）

关于此处的认证，
Jsoup版本为1.11.3时使用，但可能会依旧面临SSL认证失败
Connection connect = Jsoup.connect("https://www.baidu.com/").validateTLSCertificates(false);

//警示：此处validateTLSCertificates()方法在jsoup版本为1.12后被删除，现在连接HTTPS时，无需多写东西去验证SSL。——>我的这个想法是片面的，之前没测试到https需要SSL的，现在发现了一个，补充一下SSL信任管理方法。

Connection connect = Jsoup.connect("https://www.baidu.com/");

方法：

package com.xp.climb.climb03.jsoup;import java.io.IOException;
import java.security.cert.X509Certificate;
import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLSocketFactory;
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;public class JsoupConnectSSLInit {public static void main(String[] args) throws IOException {initUnSecureTSL();String url = "https://www.creditchina.gov.cn/xinyongfuwu/?navPage=5";//创建连接Connection connect = Jsoup.connect(url);//请求网页Document document = connect.get();int status = connect.response().statusCode();System.out.println(status);//输出HTMLSystem.out.println(document.html());}private static void initUnSecureTSL()  {// 创建信任管理器(不验证证书)final TrustManager[] trustAllCerts = new TrustManager[]{new X509TrustManager() {//检查客户端证书public void checkClientTrusted(final X509Certificate[] chain, final String authType) {//do nothing 接受任意客户端证书}//检查服务器端证书  public void checkServerTrusted(final X509Certificate[] chain, final String authType) {//do nothing  接受任意服务端证书}//返回受信任的X509证书public X509Certificate[] getAcceptedIssuers() {return null; //或者return new X509Certificate[0];}}};try {// 创建SSLContext对象,并使用指定的信任管理器初始化SSLContext sslContext = SSLContext.getInstance("SSL");sslContext.init(null, trustAllCerts, new java.security.SecureRandom());基于信任管理器，创建套接字工厂 (ssl socket factory)SSLSocketFactory sslSocketFactory = sslContext.getSocketFactory();//给HttpsURLConnection配置SSLSocketFactoryHttpsURLConnection.setDefaultSSLSocketFactory(sslSocketFactory);} catch (Exception e) {e.printStackTrace();}}
}

3.1.9 大文件内容获取问题

默认情况下，Jsoup最大只能获取1M的文件。解决方案：timeout()设置时间久一些， maxBodySize(Integer.MAX_VALUE)设置文件大小最大值

Response response = Jsoup.connect(url).timeout(10*60*1000).maxBodySize(Integer.MAX_VALUE).method(Method.GET).ignoreContentType(true).execute();

package com.xp.climb.climb03.jsoup;import org.jsoup.Connection.Method;
import org.jsoup.Connection.Response;
import org.jsoup.Jsoup;import java.io.*;public class JsoupConnectBodySize1 {public static void main(String[] args) throws IOException {String url = "http://fs.pc.kugou.com/202105102347/1379b0e4a7868635b4269f6b8f816ebb/KGTX/CLTX001/41babef7c2049b73ed1cd1d8a3dcaab1.mp3";//设置超时 时间长一些，下载大文件Response response = Jsoup.connect(url).timeout(10*60*1000).maxBodySize(Integer.MAX_VALUE).method(Method.GET).ignoreContentType(true).execute();//如果响应成功，执行下面操作if (response.statusCode() == 200){//响应转化成输出流BufferedInputStream bufferedInputStream = response.bodyStream();//保存文件saveFile(bufferedInputStream,"D:\\IdeaWork\\爬虫\\src\\main\\java\\com\\xp\\climb\\climb03\\image\\music01.mp3");}}/*** 保存文件* @param  输入流* @param  保存的文件目录* @throws IOException*/static void saveFile(BufferedInputStream inputStream, String savePath) throws IOException  {//一次最多读取1kbbyte[] buffer = new byte[1024];int len = 0;//创建缓冲流FileOutputStream fileOutStream = new FileOutputStream(new File(savePath));BufferedOutputStream bufferedOut = new BufferedOutputStream(fileOutStream);//文件写入while ((len = inputStream.read(buffer, 0, 1024)) != -1) {bufferedOut.write(buffer, 0, len);}//缓冲流释放与关闭bufferedOut.flush();bufferedOut.close();}
}

3.2 HttpClient 的使用

常用HttpClient向服务器发送请求，获取响应资源

3.2.1 HttpClient导包

<!-- HttpClient只能以编程的方式通过其API用于传输和接受HTTP消息。-->
<dependency><groupId>org.apache.httpcomponents</groupId><artifactId>httpclient</artifactId><version>4.5.5</version>
</dependency>

3.2.2 请求URL

具体步骤：

主要是前三步

package com.xp.climb.climb03.HttpClient01;import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;import org.apache.http.HttpConnection;
import org.apache.http.HttpResponse;
import org.apache.http.HttpStatus;
import org.apache.http.HttpVersion;
import org.apache.http.ProtocolVersion;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.utils.URIBuilder;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicHttpResponse;
import org.apache.http.util.EntityUtils;public class HttpclientInit01 {public static void main(String[] args) throws  ClientProtocolException, IOException, URISyntaxException  {/*(一)HttpClient实例化方法*HttpClients.custom()返回值HttpClientBuilder.create()*HttpClients.createDefault()返回值 HttpClients.custom().build()*具体可阅读HttpClients类**/HttpClient httpClient1 = new DefaultHttpClient();HttpClient httpClient2 = HttpClients.custom().build();HttpClient httpClient3 = HttpClientBuilder.create().build();CloseableHttpClient httpClient4 = HttpClients.createDefault();HttpClient httpClient5 = HttpClients.createSystem();HttpClient httpClient6 = HttpClients.createMinimal();/*(二)创建请求的三种方法Get方法为例*public HttpGet(){super();} //第一个需要自主设置uri* public HttpGet(final URI uri){super(); setURI(uri);}* public HttpGet(final String uri){super(); setURI(URI.create(uri));}*/URI uri = new URIBuilder("https://www.w3school.com.cn/b.asp").build();  //创建URIHttpGet getMethod = new HttpGet();  //  第一种get方法请求getMethod.setURI(uri);  //设置uri/*(三)执行请求 获取HttpResponse**/HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,
HttpStatus.SC_OK, "OK");                        //初始化HTTP响应response = httpClient6.execute(getMethod);                   //执行响应/*(四)查看响应信息**/System.out.println("response:" + response);String status = response.getStatusLine().toString();    //响应状态System.out.println("status:" + status);int StatusCode = response.getStatusLine().getStatusCode(); //获取响应状态码System.out.println("StatusCode:" + StatusCode);ProtocolVersion protocolVersion = response.getProtocolVersion(); //协议的版本号System.out.println("protocolVersion" + protocolVersion);String phrase = response.getStatusLine().getReasonPhrase(); //是否okSystem.out.println("phrase:" + phrase);System.out.println(response);if(StatusCode == 200){                          //状态码200表示响应成功//获取实体内容String entity = EntityUtils.toString (response.getEntity(),"gbk");//输出实体内容System.out.println(entity);EntityUtils.consume(response.getEntity());       //消耗实体}else {//关闭HttpEntity的流实体EntityUtils.consume(response.getEntity());        //消耗实体}}
}

另一个方法的前三步：

//初始化HttpContextHttpContext localContext = new BasicHttpContext();String url = "http://www.w3school.com.cn/b.asp";//(一)初始化httpclientHttpClient httpClient = HttpClients.custom().build();//(二)创建请求HttpGet httpGet = new HttpGet(url);//(三)执行请求获取HttpResponseHttpResponse httpResponse = null;try {httpResponse = httpClient.execute(httpGet,localContext);} catch (IOException e) {e.printStackTrace();}//获取具体响应信息

3.2.3 EntityUtils类

作用：操作响应实体

String entity = EntityUtils.toString (response.getEntity(),"gbk"); //采用gbk编码格式转换response.getEntity()成字符串

String entity = EntityUtils.toString (response.getEntity()); //采用默认的 ISO-8859-1编码格式转换response.getEntity()成字符串

额外：
byte[] bytes = EntityUtils.toByteArray (entity); //实体转化成字节数组
结束后需要释放资源

//关闭HttpEntity的流实体
EntityUtils.consume(response.getEntity()); //消耗实体

3.2.4 设置头信息

HttpClient设置请求头：

package com.xp.climb.climb03.HttpClient01;import java.io.IOException;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;public class HttpclientSetHeader1 {public static void main(String[] args) throws IOException {HttpClient httpClient = HttpClients.custom().build(); //初始化httpclientHttpGet httpget = new HttpGet("http://www.w3school.com.cn/b.asp"); //使用的请求方法//请求头配置httpget.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8");httpget.setHeader("Accept-Encoding", "gzip, deflate");httpget.setHeader("Accept-Language", "zh-CN,zh;q=0.9");httpget.setHeader("Cache-Control", "max-age=0");httpget.setHeader("Host", "www.w3school.com.cn");httpget.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36"); //这项内容很重要HttpResponse response = httpClient.execute(httpget);  //发出get请求//获取响应状态码int code = response.getStatusLine().getStatusCode();  HttpEntity httpEntity = response.getEntity();  //获取网页内容流String entity = EntityUtils.toString(httpEntity, "gbk");     //以字符串的形式(需设置编码)System.out.println(code + "\n" + entity); //输出所获得的的内容EntityUtils.consume(httpEntity);     //关闭内容流           }}

3.2.5 POST提交表单

爬虫中，经常遇见表单提交（模拟登录）。

HttpClient提供了实体类UrlEncodedFormEntity处理表单提交，UrlEncodedFormEntity会使用URL encoding来编码参数，产生如下所示的内容

param1=value1&param2=value2

其中登录的真实URL请求要去F12中查看

package com.xp.climb.climb03.HttpClient01;import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.ParseException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.BasicResponseHandler;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.HTTP;
import org.apache.http.util.EntityUtils;public class HttpclientRenren {public static void main(String[] args) throws ParseException, IOException {HttpClient httpclient = HttpClients.custom().build(); //初始化httpclientString renRenLoginURL = "http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=2021431030"; //登陆的地址HttpPost httpost = new HttpPost(renRenLoginURL);  //采用post方法//建立一个NameValuePair数组，用于存储欲传送的参数List<NameValuePair> nvps = new ArrayList<NameValuePair>();nvps.add(new BasicNameValuePair("email", "******"));   //输入你的邮箱地址nvps.add(new BasicNameValuePair("password", "******"));   //输入你的密码HttpResponse response = null;try {  //表单参数提交httpost.setEntity(new UrlEncodedFormEntity(nvps, HTTP.UTF_8));  response = httpclient.execute(httpost); } catch (Exception e) {  e.printStackTrace();  } finally {  //释放连接httpost.abort();  }  System.out.println(response.getStatusLine());String entityString = EntityUtils.toString (response.getEntity(),"gbk"); //注意设置编码System.out.println(entityString);//登录完成之后需要请求的内容，这里是我个人好友HttpGet httpget = new HttpGet("http://www.renren.com/465530468/profile?v=info_timeline");//构建一个 responseHandlerResponseHandler<String> responseHandler = new BasicResponseHandler();  String responseBody = "";  try {  responseBody = httpclient.execute(httpget, responseHandler);  } catch (Exception e) {  e.printStackTrace();  responseBody = null;  } finally {  //释放连接httpget.abort();  }  //输出请求到的内容System.out.println(responseBody);}}

3.2.6 超时设置

HttpClient 可配置三种超时时间（RequestConfig类中custom()方法），返回值为Builder（配置器）

RequestTimeout（请求连接超时时间）
ConnectTimeout（建立连接超时时间）
SocketTimeout（获取数据超时时间）

package com.xp.climb.climb03.HttpClient01;import java.io.IOException;import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;public class HttpclientConnectionTime {public static void main(String[] args) throws ClientProtocolException, IOException  {//实例化HttpClientCloseableHttpClient httpClient = HttpClients.createDefault();//HTTP Get请求(POST雷同)HttpGet httpGet=new HttpGet("http://www.w3school.com.cn/b.asp");//设置请求和传输超时时间RequestConfig requestConfig = RequestConfig.custom().setSocketTimeout(2000).setConnectTimeout(2000).build();httpGet.setConfig(requestConfig);//声明HttpResponse获取响应信息HttpResponse response =null;//执行请求response = httpClient.execute(httpGet);//转换响应实体成字符串String result = EntityUtils.toString(response.getEntity(),"gbk");System.out.println(result);}
}

3.2.7 代理服务器的使用（proxy）

一样使用超时方法中的（RequestConfig类中custom()方法）

package com.xp.climb.climb03.HttpClient01;import java.io.IOException;
import org.apache.http.HttpHost;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;public class HttpclientProxy2 {public static void main(String[] args) throws ClientProtocolException, IOException {HttpClient httpClient = HttpClients.custom().build();  //实例化httpclient// 设置代理HttpHost proxy = new HttpHost("171.221.239.11",808, null);RequestConfig config = RequestConfig.custom().setProxy(proxy).build();HttpGet httpGet = new HttpGet("http://www.w3school.com.cn/b.asp");httpGet.setConfig(config); //针对实例化的请求方法设置代理HttpResponse httpResponse = httpClient.execute(httpGet);if (httpResponse.getStatusLine().getStatusCode() == 200){String result = EntityUtils.toString(httpResponse.getEntity(),"gbk"); System.out.println(result);}}
}

出现问题(代理服务器没有设置)：

3.2.8 文件下载

package com.xp.climb.climb03.HttpClient01;import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
public class HttpclientDownloadFile {public static void main(String[] args) throws IOException {String url = "http://fs.pc.kugou.com/202105121125/8ecb6e75e8efc1b8724c6e29b9d8bee4/KGTX/CLTX001/1e677ec385c5ec7a7485348301bd8937.mp3";HttpClient httpClient = HttpClients.custom().build(); //初始化httpclientHttpGet httpGet = new HttpGet(url);//获取结果HttpResponse httpResponse = null;try {httpResponse = httpClient.execute(httpGet);} catch (IOException e) {e.printStackTrace();}/*** 非常简单的下载文件的方法*///先定义文件输出流位置OutputStream out = new FileOutputStream("D:\\IdeaWork\\爬虫\\src\\main\\java\\com\\xp\\climb\\climb03\\image\\music02.mp3");//获取响应实体，用HttpEnity类中的writeTo()方法，直接将实体写入指定的输出流httpResponse.getEntity().writeTo(out);EntityUtils.consume(httpResponse.getEntity()); //消耗实体}}

3.2.9 HTTPS请求认证（某一些网站需要）

主要测试类：

package com.xp.climb.climb03.HttpClient01.ssl;import java.io.IOException;import org.apache.http.HttpResponse;
import org.apache.http.HttpStatus;
import org.apache.http.ParseException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
public class Test {public static void main(String[] args) throws ParseException, IOException {String url = "https://www.creditchina.gov.cn/xinyongfuwu/?navPage=5";SSLClient sslClient = new SSLClient();   //实例化HttpClient httpClientSSL = sslClient.initSSLClient("TLS");//       SSLUtil sslUtil = new SSLUtil();
//      try {
//          SSLUtil.ignoreSsl();
//      } catch (Exception e) {
//          e.printStackTrace();
//      }
//      HttpClient httpClientSSL = SSLClient.createSSLClientDefault();HttpGet httpGet = new HttpGet(url);//若不加上这个消息头，会出现403错误。！！！！！！！！！！！！！！！httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36");//获取结果HttpResponse httpResponse = null;try {httpResponse = httpClientSSL.execute(httpGet);} catch (IOException e) {e.printStackTrace();}System.out.println(httpResponse.getStatusLine().getStatusCode());if(httpResponse .getStatusLine().getStatusCode() == HttpStatus.SC_OK){ //状态码200表示响应成功//获取实体内容String entity = EntityUtils.toString (httpResponse.getEntity(),"UTF-8");//输出实体内容System.out.println(entity);EntityUtils.consume(httpResponse.getEntity());       //消耗实体}else {//关闭HttpEntity的流实体EntityUtils.consume(httpResponse.getEntity());        //消耗实体}}
}

SSL工具类：

package com.xp.climb.climb03.HttpClient01.ssl;import java.security.KeyManagementException;
import java.security.KeyStoreException;
import java.security.NoSuchAlgorithmException;
import java.security.cert.CertificateException;
import java.security.cert.X509Certificate;
import java.util.Arrays;
import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.SSLContext;
import javax.net.ssl.X509TrustManager;
import org.apache.http.client.HttpClient;
import org.apache.http.client.config.AuthSchemes;
import org.apache.http.client.config.CookieSpecs;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.config.Registry;
import org.apache.http.config.RegistryBuilder;
import org.apache.http.conn.socket.ConnectionSocketFactory;
import org.apache.http.conn.socket.PlainConnectionSocketFactory;
import org.apache.http.conn.ssl.NoopHostnameVerifier;
import org.apache.http.conn.ssl.SSLConnectionSocketFactory;
import org.apache.http.conn.ssl.TrustStrategy;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.apache.http.ssl.SSLContextBuilder;public class SSLClient {public static CloseableHttpClient createSSLClientDefault() {try {//使用 loadTrustMaterial() 方法实现一个信任策略，信任所有证书SSLContext sslContext = new SSLContextBuilder().loadTrustMaterial(null, new TrustStrategy() {// 信任所有public boolean isTrusted(X509Certificate[] chain, String authType) throws CertificateException {return true;}}).build();//NoopHostnameVerifier类:  作为主机名验证工具，实质上关闭了主机名验证，它接受任何//有效的SSL会话并匹配到目标主机。HostnameVerifier hostnameVerifier = NoopHostnameVerifier.INSTANCE;SSLConnectionSocketFactory sslsf = new SSLConnectionSocketFactory(sslContext, hostnameVerifier);return HttpClients.custom().setSSLSocketFactory(sslsf).build();} catch (KeyManagementException e) {e.printStackTrace();} catch (NoSuchAlgorithmException e) {e.printStackTrace();} catch (KeyStoreException e) {e.printStackTrace();}return HttpClients.createDefault();}/*** 基于SSL配置httpClient* @param  SSLProtocolVersion(SSL, SSLv3, TLS, TLSv1, TLSv1.1, TLSv1.2)* @return httpClient*/public HttpClient initSSLClient(String SSLProtocolVersion){RequestConfig defaultConfig = null;  PoolingHttpClientConnectionManager pcm = null;try {X509TrustManager xtm = new SSL509TrustManager(); //创建信任管理//创建SSLContext对象,，并使用指定的信任管理器初始化SSLContext context = SSLContext.getInstance(SSLProtocolVersion);context.init(null, new X509TrustManager[]{xtm}, null);//从SSLContext对象中得到SSLConnectionSocketFactory对象SSLConnectionSocketFactory sslConnectionSocketFactory = new SSLConnectionSocketFactory(context, NoopHostnameVerifier.INSTANCE);/*从SSLContext对象中得到SSLConnectionSocketFactory对象*NoopHostnameVerifier.INSTANCE表示接受接受任何有效的和符合目标主机的SSL会话*/Registry<ConnectionSocketFactory> sfr = RegistryBuilder.<ConnectionSocketFactory>create().register("http", PlainConnectionSocketFactory.INSTANCE).register("https", sslConnectionSocketFactory).build();//基于配置创建连接池pcm = new PoolingHttpClientConnectionManager(sfr);}catch(NoSuchAlgorithmException | KeyManagementException e){e.printStackTrace();}//设置全局请求配置,包括Cookie规范,HTTP认证,超时defaultConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.STANDARD_STRICT).setExpectContinueEnabled(true).setTargetPreferredAuthSchemes(Arrays.asList(AuthSchemes.NTLM, AuthSchemes.DIGEST)).setProxyPreferredAuthSchemes(Arrays.asList(AuthSchemes.BASIC)).setConnectionRequestTimeout(30*1000).setConnectTimeout(30*1000).setSocketTimeout(30*1000).build();//初始化httpclientHttpClient httpClient = HttpClients.custom().setConnectionManager(pcm).setDefaultRequestConfig(defaultConfig).build();return httpClient;}//实现X509TrustManager接口private static class SSL509TrustManager implements X509TrustManager {//检查客户端证书public void checkClientTrusted(X509Certificate[] x509Certificates, String s) {//do nothing 接受任意客户端证书}//检查服务器端证书  public void checkServerTrusted(X509Certificate[] x509Certificates, String s)  {//do nothing  接受任意服务端证书}//返回受信任的X509证书public X509Certificate[] getAcceptedIssuers() {return new X509Certificate[0];}};
}

3.2.10 请求重试

package com.xp.climb.climb03.HttpClient01.retry;import java.io.IOException;
import java.util.Arrays;import org.apache.http.HttpResponse;
import org.apache.http.ParseException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.config.AuthSchemes;
import org.apache.http.client.config.CookieSpecs;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpRequestRetryHandler;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;public class HttpclientRetry {public static void main(String[] args) throws ParseException, IOException {//配置信息RequestConfig defaultConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.STANDARD_STRICT).setExpectContinueEnabled(true).setTargetPreferredAuthSchemes(Arrays.asList(AuthSchemes.NTLM, AuthSchemes.DIGEST)).setProxyPreferredAuthSchemes(Arrays.asList(AuthSchemes.BASIC)).setConnectionRequestTimeout(10*1000).setConnectTimeout(5*1000).setSocketTimeout(5*1000).build();//默认3次HttpClient  httpClient = HttpClients.custom().setDefaultRequestConfig(defaultConfig).setRetryHandler(new DefaultHttpRequestRetryHandler()).build();//自定义设置重试次数/*HttpClient  httpClient = HttpClients.custom().setDefaultRequestConfig(defaultConfig).setRetryHandler(new DefaultHttpRequestRetryHandler(5, true)).build();*/HttpGet httpGet = new HttpGet("https://mashable.com/category/twitter/");HttpResponse response = null;  try { response = httpClient.execute(httpGet);  //执行请求}catch (Exception e){  e.printStackTrace();  } String result = EntityUtils.toString(response.getEntity(),"gbk");  //获取结果，htmlSystem.out.println(result);   //输出结果EntityUtils.consume(response.getEntity());        //消耗实体}}

3.2.11 多线程执行请求

package com.xp.climb.climb03.HttpClient01.thread;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.charset.CodingErrorAction;
import java.util.Arrays;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;import com.xp.climb.climb03.HttpClient01.ssl.SSLClient;
import org.apache.http.Consts;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.config.AuthSchemes;
import org.apache.http.client.config.CookieSpecs;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.protocol.HttpClientContext;
import org.apache.http.config.ConnectionConfig;
import org.apache.http.config.SocketConfig;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.apache.http.protocol.HttpContext;
import org.apache.http.util.EntityUtils;
public class Test {public static void main(String[] args) throws FileNotFoundException {//添加连接参数ConnectionConfig connectionConfig = ConnectionConfig.custom().setMalformedInputAction(CodingErrorAction.IGNORE).setUnmappableInputAction(CodingErrorAction.IGNORE).setCharset(Consts.UTF_8).build();//添加socket参数SocketConfig socketConfig = SocketConfig.custom().setTcpNoDelay(true).build();//配置连接池管理器PoolingHttpClientConnectionManager pcm = new PoolingHttpClientConnectionManager();// 设置最大连接数pcm.setMaxTotal(100);// 设置每个连接的路由数pcm.setDefaultMaxPerRoute(10);//设置连接信息pcm.setDefaultConnectionConfig(connectionConfig);//设置socket信息pcm.setDefaultSocketConfig(socketConfig);//设置全局请求配置,包括Cookie规范,HTTP认证,超时RequestConfig defaultConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.STANDARD_STRICT).setExpectContinueEnabled(true).setTargetPreferredAuthSchemes(Arrays.asList(AuthSchemes.NTLM, AuthSchemes.DIGEST)).setProxyPreferredAuthSchemes(Arrays.asList(AuthSchemes.BASIC)).setConnectionRequestTimeout(30*1000).setConnectTimeout(30*1000).setSocketTimeout(30*1000).build();CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(pcm).setDefaultRequestConfig(defaultConfig).build();// 请求的URLString[] urlArr = {"https://www.creditchina.gov.cn/xinyongfuwu/?navPage=5","http://www.w3school.com.cn/html/index.asp","http://www.w3school.com.cn/html/html_basic.asp","http://www.w3school.com.cn/html/html_elements.asp","http://www.w3school.com.cn/html/html_attributes.asp","http://www.w3school.com.cn/html/html_formatting.asp"};//创建固定大小的线程池ExecutorService exec = Executors.newFixedThreadPool(3);for(int i = 0; i< urlArr.length;i++){String filename = "credit.asp";//urlArr[i].split("html/")[1]; //HTML需要输出的文件名//创建HTML文件输出目录OutputStream out = new FileOutputStream("D:\\IdeaWork\\爬虫\\src\\main\\java\\com\\xp\\climb\\climb03\\image\\" + filename);HttpGet httpget = new HttpGet(urlArr[i]);SSLClient sslClient = new SSLClient();   //实例化//https:通过SSL认证httpClient = sslClient.createSSLClientDefault();//关键步骤————设置请求消息头User-Agenthttpget.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36");//此处只是：启动线程执行请求（run），http请求的执行在run中。exec.execute(new DownHtmlFileThread(httpClient, httpget, out));}//关闭线程exec.shutdown();}static class DownHtmlFileThread extends Thread {private final CloseableHttpClient httpClient;private final HttpContext context;private final HttpGet httpget;private final OutputStream out;//输入的参数public DownHtmlFileThread(CloseableHttpClient httpClient, HttpGet httpget, OutputStream out) {this.httpClient = httpClient;this.context = HttpClientContext.create();this.httpget = httpget;this.out = out;}@Overridepublic void run() {System.out.println(Thread.currentThread().getName() + "线程请求的URL为:" + httpget.getURI());try {CloseableHttpResponse response = httpClient.execute(httpget, context);  //执行请求try {//HTML文件写入文档out.write(EntityUtils.toString(response.getEntity(),"gbk").getBytes());out.close();//消耗实体EntityUtils.consume(response.getEntity());} finally{response.close(); //关闭响应}} catch (ClientProtocolException ex) {ex.printStackTrace(); // 处理 protocol错误} catch (IOException ex) {ex.printStackTrace(); // 处理I/O错误}}}
}

Java网络爬虫学习记录（请求基础篇）相关推荐

第三十六期:学 Java 网络爬虫，需要哪些基础知识？
说起网络爬虫,大家想起的估计都是 Python ,诚然爬虫已经是 Python 的代名词之一,相比 Java 来说就要逊色不少.有不少人都不知道 Java 可以做网络爬虫,其实 Java 也能做网络爬 ...
学 Java 网络爬虫，需要哪些基础知识？
说起网络爬虫,大家想起的估计都是 Python ,诚然爬虫已经是 Python 的代名词之一,相比 Java 来说就要逊色不少.有不少人都不知道 Java 可以做网络爬虫,其实 Java 也能做网络爬 ...
网络攻防学习（Web基础篇）——小迪安全
一.一些常见的基本概念 1.DNS (1)域名解析系统 (2)与HOST的关系: 先在本地查找HOST文件,找不到在在网上查找相同的DNS 2.CDN (1)内容分发网络,目的是让用户能够更快的得到请 ...
python网络爬虫、Java 网络爬虫，哪个更好？
说起网络爬虫,大家想起的估计都是 Python ,诚然爬虫已经是 Python 的代名词之一,相比 Java 来说就要逊色不少.有不少人都不知道 Java 可以做网络爬虫,其实 Java 也能做网络爬 ...
java 网络爬虫正则表达式_【干货】Java网络爬虫基础知识
原标题:[干货]Java网络爬虫基础知识引言 Java 网络爬虫具有很好的扩展性可伸缩性,其是目前搜索引擎开发的重要组成部分.例如,著名的网络爬虫工具 Nutch 便是采用 Java 开发,该工具以 ...
Java网络爬虫该如何学习
文章目录引言怎么入门网络爬虫课程特色学完本课程能收获什么引言互联网以及移动技术的飞速发展,使得全球数据量呈现前所未有的爆炸式增长态势.例如,用户在互联网上的搜索数据.交易数据.评论数据.社 ...
Java学习---Day16_IO流基础篇
Java学习-Day16_IO流基础篇文件操作操作磁盘上的某一个文件或某一个文件夹,可以对他们进行创建或删除.移动.属性获取.属性设置等操作.但是,不包含读取文件的内容.拷贝文件 ps:java中 ...
java jsoup 网络爬虫学习例子（八）京东和淘宝商品比价 PhantomJS
java jsoup 网络爬虫学习例子(八)京东和淘宝商品比价 PhantomJS /** filename getHtml.js* phantomjs.exe 2.0.0* author InJa ...
java jsoup 网络爬虫学习例子（七）京东和淘宝商品比价 htmlunit
java jsoup 网络爬虫学习例子(七)京东和淘宝商品比价 htmlunit package com.iteye.injavawetrust.pricecheck;import java.uti ...

Java网络爬虫学习记录（请求基础篇）

个人实验遇见错误集：

一、javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

二、SSL验证过后出现403

前言：参考书籍——网络数据采集技术Java网络爬虫实战（钱洋、姜元春）

1. 网络爬虫及Java基础知识

1.1 集合

1.1.1 List和Set集合

1.1.1 Map和Queue集合

1.2 String类

1.3 日期和时间处理

1.3.1 UNIX时间戳 处理

1.4 正则表达式

1.5 关于jar包依赖

1.5.1 细说log4j日志信息

1.5.2 log4j配置实例：（log4j.properties）

1.5.3 log4j配置测试实例：

2.HTTP协议基础与网络抓包

2.1 HTTP协议简介

2.2 URL数据及访问步骤

2.2.1 浏览器获取服务器资源的详细步骤

2.3 报文

2.4 HTTP 请求方法

2.5 HTTP状态码

2.6 HTTP 信息头

2.6.1 通用头

2.6.2 消息头

2.6.3 响应头

2.6.4 实体头

2.6.5 谷歌抓包看信息

3.网页内容获取工具（以下介绍了Jsoup、HttpClient）还有URLConnetion自行了解

3.1 Jsoup的使用

3.1.1Jsoup功能简介

3.1.2 请求URL

3.1.3 设置头信息以及作用

3.1.4 提交请求参数

3.1.5 超时设置

3.1.6 代理服务器的使用

3.1.7 响应转输出流（图片、PDF...的下载）

3.1.8 HTTPS请求认证（SSL）

3.1.9 大文件内容获取问题

3.2 HttpClient 的使用

3.2.1 HttpClient导包

3.2.2 请求URL

3.2.3 EntityUtils类

3.2.4 设置头信息

3.2.5 POST提交表单

3.2.6 超时设置

3.2.7 代理服务器的使用（proxy）

3.2.8 文件下载

3.2.9 HTTPS请求认证（某一些网站需要）

3.2.10 请求重试

3.2.11 多线程执行请求

Java网络爬虫学习记录（请求基础篇）相关推荐

最新文章

热门文章

1.3.1 UNIX时间戳处理