网络爬虫入门系列(3) httpClient

上一篇文章介绍了 Jsoup设置请求头，抓取网页的java 代码

这一篇文章介绍 httpClient 设置请求头抓取网页的 java 代码实现

首先到官网上下载 httpClient 这里下载的是 4.5.5版本的

http://mirror.bit.edu.cn/apache//httpcomponents/httpclient/binary/httpcomponents-client-4.5.5-bin.zip

将

commons-logging-1.2.jar
httpclient-4.5.5.jar
httpcore-4.4.9.jar
httpmime-4.5.5.jar

导入到项目中 ,

新建一个类 , httpClientConnection

编写如下代码

public class httpClientConnection {
public static void main(String[] args) {
CloseableHttpClient httpclient = HttpClients.createDefault();

CloseableHttpResponse responseGet = null;
try {
// 以get方法执行请求
HttpGet httpGet = new HttpGet("http://www.cnblogs.com/szw-blog/p/8565944.html");
// 获得服务器响应的所有信息
responseGet = httpclient.execute(httpGet);
System.out.println(responseGet.getStatusLine());
// 获得服务器响应的消息体（不包括http head）
HttpEntity entity = responseGet.getEntity();

if (entity != null) {
// 获得响应字符集编码
ContentType contentType = ContentType.getOrDefault(entity);
Charset charset = contentType.getCharset();
InputStream is = entity.getContent();
// 将inputstream转化为reader，并使用缓冲读取，还可按行读取内容
BufferedReader br = new BufferedReader(
new InputStreamReader(is, charset));
String line = null;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
is.close();
responseGet.close();
httpclient.close();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
responseGet.close();
httpclient.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

运行后的结果是

以上就是 java 使用httpclient 抓取网页的简单代码

转载于:https://www.cnblogs.com/szw-blog/p/8569925.html

网络爬虫入门系列(3) httpClient相关推荐

相对舒适的爬虫入门系列（一）：手快尝鲜【requests库】
一.实现爬虫命令的交互前,总是要经过环境配置的 (虽然不同系统操作层面上会有不同,本文主要讲方向性内容(配置环境)+一些具体实操代码上手,要问咱也先说咱的环境是win10哈) 1.请直接下载安装Ana ...
Python爬虫入门系列——Urllib详解
Python爬虫入门系列--Urllib详解 1.背景 1.1 初识爬虫 1.2 合法性 1.3 robots协议 2.要求 2.1 当前开发环境 2.2 编程基础 3.快速上手Urllib 3.1 ...
Python网络爬虫入门
Python网络爬虫入门网络爬虫(web crawler),也叫网络蜘蛛(Web Spider).网络机器人(Internet Bot).简单地说,抓取万维网(World Wide Web)上所需要 ...
【网络爬虫入门02】HTTP客户端库Requests的基本原理与基础应用
[网络爬虫入门02]HTTP客户端库Requests的基本原理与基础应用广东职业技术学院欧浩源 2017-10-15 1.引言实现网络爬虫的第一步就是要建立网络连接并向服务器或网页等网络资源 ...
【网络爬虫入门04】彻底掌握BeautifulSoup的CSS选择器
[网络爬虫入门04]彻底掌握BeautifulSoup的CSS选择器广东职业技术学院欧浩源 2017-10-21 1.引言目前,除了官方文档之外,市面上及网络详细介绍BeautifulSoup ...
【网络爬虫入门01】应用Requests和BeautifulSoup联手打造的第一条网络爬虫
[网络爬虫入门01]应用Requests和BeautifulSoup联手打造的第一条网络爬虫广东职业技术学院欧浩源 2017-10-14 1.引言在数据量爆发式增长的大数据时代,网络与用户的沟 ...
Python：网络爬虫入门
Python:网络爬虫入门这只是一个最最最基础版本的Python爬虫入门,代码是我两年前写的,最近两天没事翻出来再写(shui)一篇博客.就是爬取王者荣耀英雄的皮肤.然后备注也是写的十分的详细,所以 ...
python爬虫学习笔记一：网络爬虫入门
参考书目 <python网络爬虫从入门到实践>唐松第一章网络爬虫入门 1.1 robots协议举例:查看京东的robots协议京东robots协议地址 User-agent: * ...
python六小时网络爬虫入门_一小时入门 Python 3 网络爬虫
原标题:一小时入门 Python 3 网络爬虫作者:Jack-Cui,热爱技术分享,活跃于 CSDN 和知乎,开设的<Python3网络爬虫入门>.<Python3机器学习> ...

网络爬虫入门系列(3) httpClient

网络爬虫入门系列(3) httpClient相关推荐

最新文章

热门文章