最近在获取一个购物网站商品,发现浏览器和postman每次都可以请求到内容,但是java代码尝试了各种方式都是Connection refused: connect,一开始以为是java代码的问题,后来突然意识到java是在虚拟机上运行的,可能需要使用代码进行代理,果然如此【虽然代理打开的,但是java并不会自己去走代理的那条通道】!

无法连接的错误消息如下:

java.net.ConnectException: Connection refused: connectat java.net.DualStackPlainSocketImpl.connect0(Native Method)at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)at java.net.AbstractPlainSocketImpl.connect(Unknown Source)at java.net.PlainSocketImpl.connect(Unknown Source)at java.net.Socket.connect(Unknown Source)at java.net.Socket.connect(Unknown Source)at sun.net.NetworkClient.doConnect(Unknown Source)at sun.net.www.http.HttpClient.openServer(Unknown Source)at sun.net.www.http.HttpClient$1.run(Unknown Source)at sun.net.www.http.HttpClient$1.run(Unknown Source)at java.security.AccessController.doPrivileged(Native Method)at sun.net.www.http.HttpClient.privilegedOpenServer(Unknown Source)at sun.net.www.http.HttpClient.openServer(Unknown Source)at sun.net.www.protocol.https.HttpsClient.<init>(Unknown Source)at sun.net.www.protocol.https.HttpsClient.New(Unknown Source)at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(Unknown Source)at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source)at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(Unknown Source)at cn.renkai721.tools.Tools.getHttps(Tools.java:120)at cn.renkai721.tools.Tools.main(Tools.java:57)

以下是使用代理的JAVA核心代码,常量的初始化

// 本地代理服务器IP
private static String proxyHost = "127.0.0.1";
// 本地代理服务器端口
private static int proxyPort = 19180;
String url = "https://购物网站地址/search.html?p=&page=0";
private static String dir = "d:\\jshop";

代理+HttpURLConnection的请求代码,这里是HTTPS GET的请求

/*** GET请求* @author: renkai721* @date: 2020年7月24日 下午4:08:02* @Title: getHttps* @Description:* @param url* @param openProxy* @return*/
public static String getHttps(String url, boolean openProxy) {BufferedReader in = null;StringBuffer sb = new StringBuffer();try {URL realUrl = new URL(url);// 打开和URL之间的连接HttpURLConnection connection = null;if (openProxy) {// 使用代理模式System.out.println("系统开始使用代理模式");@SuppressWarnings("static-access")Proxy proxy = new Proxy(Proxy.Type.DIRECT.HTTP, new InetSocketAddress(proxyHost, proxyPort));connection = (HttpURLConnection) realUrl.openConnection(proxy);} else {System.out.println("没有使用代理模式");connection = (HttpURLConnection) realUrl.openConnection();}// https 忽略证书验证SSLContext ctx = MyX509TrustManagerUtils();((HttpsURLConnection) connection).setSSLSocketFactory(ctx.getSocketFactory());((HttpsURLConnection) connection).setHostnameVerifier(new HostnameVerifier() {@Overridepublic boolean verify(String arg0, SSLSession arg1) {return true;}});// 设置通用的请求属性connection.setRequestProperty("connection", "Keep-Alive");connection.setRequestProperty("user-agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36");connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");connection.setRequestMethod("GET");// 建立连接connection.connect();// 定义 BufferedReader输入流来读取URL的响应if (connection.getResponseCode() == HttpURLConnection.HTTP_OK|| connection.getResponseCode() == HttpURLConnection.HTTP_CREATED|| connection.getResponseCode() == HttpURLConnection.HTTP_ACCEPTED) {System.out.println("系统已经连接上地址,开始获取数据");in = new BufferedReader(new InputStreamReader(connection.getInputStream()));} else {System.out.println("系统无法连接,请检查VPN代理IP和端口");in = new BufferedReader(new InputStreamReader(connection.getErrorStream()));}String line = "";while ((line = in.readLine()) != null) {sb.append(line);}
//          System.out.println("获取的数据=" + sb.toString());return sb.toString();} catch (Exception e) {System.out.println("抓取数据失败,请检查VPN代理IP和端口");e.printStackTrace();} finally {try {if (in != null) {in.close();}} catch (Exception e2) {e2.printStackTrace();}}return "";
}

这里是HTTPS POST的请求。

/*** POST请求* @param url 发送请求的 URL* @param param 请求参数 name1=value1&name2=value2 的形式* @param isproxy 是否使用代理模式* @return 响应结果*/
public static String sendPost(String url, String param, boolean isproxy) {OutputStreamWriter out = null;BufferedReader in = null;String result = "";try {URL realUrl = new URL(url);HttpURLConnection conn = null;if (isproxy) {// 使用代理模式@SuppressWarnings("static-access")Proxy proxy = new Proxy(Proxy.Type.DIRECT.HTTP, new InetSocketAddress(proxyHost, proxyPort));conn = (HttpURLConnection) realUrl.openConnection(proxy);} else {conn = (HttpURLConnection) realUrl.openConnection();}// httpsif (url.substring(0, 5).equals("https")) {SSLContext ctx = MyX509TrustManagerUtils();((HttpsURLConnection) conn).setSSLSocketFactory(ctx.getSocketFactory());((HttpsURLConnection) conn).setHostnameVerifier(new HostnameVerifier() {// 在握手期间,如果 URL 的主机名和服务器的标识主机名不匹配,则验证机制可以回调此接口的实现程序来确定是否应该允许此连接。@Overridepublic boolean verify(String arg0, SSLSession arg1) {return true;}});}// 发送POST请求必须设置如下两行conn.setDoOutput(true);conn.setDoInput(true);conn.setRequestMethod("POST"); // POST方法// 设置通用的请求属性conn.setRequestProperty("connection", "Keep-Alive");conn.setRequestProperty("user-agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36");conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");conn.connect();// 获取URLConnection对象对应的输出流out = new OutputStreamWriter(conn.getOutputStream(), "UTF-8");// 发送请求参数out.write(param);// flush输出流的缓冲out.flush();// 定义BufferedReader输入流来读取URL的响应if (conn.getResponseCode() == HttpURLConnection.HTTP_OK|| conn.getResponseCode() == HttpURLConnection.HTTP_CREATED|| conn.getResponseCode() == HttpURLConnection.HTTP_ACCEPTED) {in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));} else {in = new BufferedReader(new InputStreamReader(conn.getErrorStream(), "UTF-8"));}String line;while ((line = in.readLine()) != null) {result += line;}} catch (Exception e) {System.out.println("发送 POST 请求出现异常!");e.printStackTrace();}// 使用finally块来关闭输出流、输入流finally {try {if (out != null) {out.close();}if (in != null) {in.close();}} catch (IOException ex) {ex.printStackTrace();}}return result;
}

HTTPS绕过证书验证的核心代码

/** HTTPS忽略证书验证,防止高版本jdk因证书算法不符合约束条件,使用继承X509ExtendedTrustManager的方式*/
class MyX509TrustManager extends X509ExtendedTrustManager {@Overridepublic void checkClientTrusted(X509Certificate[] arg0, String arg1) throws CertificateException {}@Overridepublic void checkServerTrusted(X509Certificate[] arg0, String arg1) throws CertificateException {}@Overridepublic X509Certificate[] getAcceptedIssuers() {return null;}@Overridepublic void checkClientTrusted(X509Certificate[] arg0, String arg1, Socket arg2) throws CertificateException {}@Overridepublic void checkClientTrusted(X509Certificate[] arg0, String arg1, SSLEngine arg2)throws CertificateException {}@Overridepublic void checkServerTrusted(X509Certificate[] arg0, String arg1, Socket arg2) throws CertificateException {}@Overridepublic void checkServerTrusted(X509Certificate[] arg0, String arg1, SSLEngine arg2)throws CertificateException {}
}public static SSLContext MyX509TrustManagerUtils() {TrustManager[] tm = { new Tools().new MyX509TrustManager() };SSLContext ctx = null;try {ctx = SSLContext.getInstance("TLS");ctx.init(null, tm, null);} catch (Exception e) {e.printStackTrace();}return ctx;
}

下载外网图片到本地的代码,注意文件和文字的区别,文件没有使用BufferedReader而是InputStream和OutputStream,这是关键的地方。

/*** 下载图片到本地* @author: renkai721* @date: 2020年7月24日 下午5:03:53* @Title: getHttpsImage* @Description:* @param url* @param openProxy* @param dir*/
public static void getHttpsImage(String url, boolean openProxy, String dir) {System.out.println("图片下载地址=" + url);InputStream in = null;OutputStream os = null;try {URL realUrl = new URL(url);// 打开和URL之间的连接HttpURLConnection connection = null;if (openProxy) {// 使用代理模式System.out.println("系统开始使用代理模式");@SuppressWarnings("static-access")Proxy proxy = new Proxy(Proxy.Type.DIRECT.HTTP, new InetSocketAddress(proxyHost, proxyPort));connection = (HttpURLConnection) realUrl.openConnection(proxy);} else {System.out.println("没有使用代理模式");connection = (HttpURLConnection) realUrl.openConnection();}// https 忽略证书验证SSLContext ctx = MyX509TrustManagerUtils();((HttpsURLConnection) connection).setSSLSocketFactory(ctx.getSocketFactory());((HttpsURLConnection) connection).setHostnameVerifier(new HostnameVerifier() {@Overridepublic boolean verify(String arg0, SSLSession arg1) {return true;}});// 设置通用的请求属性connection.setRequestProperty("connection", "Keep-Alive");connection.setRequestProperty("user-agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36");connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");connection.setRequestMethod("GET");// 建立连接connection.connect();// 定义 BufferedReader输入流来读取URL的响应if (connection.getResponseCode() == HttpURLConnection.HTTP_OK|| connection.getResponseCode() == HttpURLConnection.HTTP_CREATED|| connection.getResponseCode() == HttpURLConnection.HTTP_ACCEPTED) {System.out.println("系统已经连接上地址,开始下载图片");in = connection.getInputStream();} else {System.out.println("系统无法下载图片,请检查VPN代理IP和端口");in = connection.getErrorStream();}url = url.substring(url.lastIndexOf("/") + 1);File image = new File(dir + "\\" + url + ".jpg");File dirFile = new File(dir);if (!dirFile.exists()) {dirFile.mkdirs();image.createNewFile();}os = new FileOutputStream(image);int len = 0;byte[] bytes = new byte[1024];while ((len = in.read(bytes)) != -1) {os.write(bytes);os.flush();}System.out.println("图片保存地址=" + image.getAbsolutePath());} catch (Exception e) {System.out.println("抓取数据失败,请检查VPN代理IP和端口");e.printStackTrace();} finally {try {if (in != null) {in.close();}} catch (Exception e2) {e2.printStackTrace();}try {if (os != null) {os.close();}} catch (Exception e2) {e2.printStackTrace();}}
}

都是JAVA本身的东西,不需要特殊的JAR包,DEMO中使用Document来解析HTML,这里需要自己添加jar包,如果是java工程,请自己下载jar包,文件名是【jsoup-1.9.2.jar】,如果是Maven的项目,请参照下面的代码

<dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.9.2</version>
</dependency>

下面的JSoup解析HTML的一个代码,大家可以参照下。

/*** 获取列表的物品* @author: renkai721* @date: 2020年7月24日 下午4:07:47* @Title: listPageInfo* @Description:* @param parse* @return*/
public static List<String> listPageInfo(Document parse) {List<String> list = new ArrayList<String>();// 解析html,按照什么编码进行解析htmlElement elementById = parse.getElementById("itmlst");Elements elementsByClass = elementById.getElementsByClass("elWrap");int i = 0;for (Element element : elementsByClass) {i++;// 获取酒店的描述信息Elements titls = element.getElementsByClass("elName");String link = "";for (Element j : titls) {link = j.getElementsByTag("a").attr("href");break;}System.out.println("第【" + i + "】个物品地址详情地址=" + link);list.add(link);if (i > 0) {// 测试代码,获取两条数据就行break;}}return list;
}public static void goodsInfo(Document parse, boolean openProxy, String oneDir) {// 解析html,按照什么编码进行解析htmlElement ItemInfo = parse.getElementById("ItemInfo");Elements mdItemInfoTitle = ItemInfo.getElementsByClass("mdItemInfoTitle");for (Element element : mdItemInfoTitle) {// 商品名称String name = element.getElementsByTag("h2").text();System.out.println("商品名称=" + name);}Elements elNum = ItemInfo.getElementsByClass("elNum");for (Element numElement : elNum) {// 商品名称String num = numElement.getElementsByTag("span").text();System.out.println("商品价格=" + num + "円");}Element itmbasic = parse.getElementById("itmbasic");Elements elThumbnail = itmbasic.getElementsByClass("elThumbnail");for (Element numElement : elThumbnail) {String src = numElement.getElementsByTag("img").attr("src");System.out.println("图片地址=" + src);String alt = numElement.getElementsByTag("img").attr("alt");System.out.println("图片描述=" + alt);getHttpsImage(src, openProxy, oneDir);}
}

JAVA使用HttpURLConnection请求HTTPS网站,不需要证书验证的DEMO教程相关推荐

  1. java用HttpURLConnection发起HTTPS请求并跳过SSL证书,解决:unable to find valid certification path to requested targ

    java用HttpURLConnection发起HTTPS请求并跳过SSL证书 问题出现:unable to find valid certification path to requested ta ...

  2. Java使用SSLContext请求https链接

    Java使用SSLContext请求https链接 先了解几个关键类 SSLContext 安全套接字协议的实现核心类 SSLSocket 扩展自Socket用户客户端 SSLSocketFactor ...

  3. https是怎么进行证书验证

    DV型证书需要验证申请者对域名有管理权,为了方便申请者验证,提供了3种验证方式,您只需要满足一项的验证要求就可以通过验证:域名解析.文件验证.邮件验证 域名解析验证 您需要对申请证书的域名有DNS解析 ...

  4. SSL(HTTPS)网站加百度云加速CDN实用教程

    网站增加一个CDN来缓解带宽压力是必须的,作为一个每天零收入的草根站长,一个免费的CDN当然是真香的最佳选择,比如百度云加速.然而,在百度云加速https的网站上添加CDN似乎有点复杂.我特意写了一篇 ...

  5. 用Qt写爬虫爬https网站图片

    前置技术:QT.爬虫基础.https(ssl证书)了解 爬虫是一种将网页上所需元素总结分类下载到本地的技术,它可以模拟人的操作爬取网页中的文字和图片,一般爬虫采用Python语言编写,不过Qt也是可以 ...

  6. SSL证书验证原理和https加密

    1.首先了解一下数字证书: 它有点像身份证,是由权威的CA机构颁发的,证书的主要内容有:公钥(Public Key).ISSUER(证书的发布机构).Subject(证书持有者).证书有效期.签名算法 ...

  7. iOS开发-https免证书验证

    此处博主做一个声明,如果你想跳过https的双向验证,仅仅单向进行直接信任所有的证书,那么你们的后台也必须是允许单向验证的,否则设置了双向验证,客户端是无法跳过的,实在不想当初辛苦的经验被无知的小白说 ...

  8. 解决Fiddler不能监听Java HttpURLConnection请求的方法

    在默认情况下,Fiddler不能监听Java HttpURLConnection请求.究其原因,Java的网络通信协议栈可能浏览器的通信协议栈略有区别,Fiddler监听Http请求的原理是 在应用程 ...

  9. Android手机访问正规https网站,第一次请求报Trust anchor for certification path not found,之后又可以正常访问的问题排查。

    今天在访问一个正规https网站的时候发现第一次请求报错: javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValida ...

最新文章

  1. iOS :UIPickerView reloadAllComponets not work
  2. 缩进对于python程序至关重要吗_缩进对于Python程序至关重要。
  3. SQL语法练习 - 使用WITH AS提高性能简化嵌套SQL
  4. oracle usenl,深入理解Oracle表(1):ORDERED和USE_NL | 学步园
  5. git 怎么读_python3中开源代码怎么读?
  6. linux继续执行上一个命令快捷键,整理了上linux 命令行上常用的 快捷键
  7. 使自己的注意力集中方法
  8. 【Java从入门到头秃专栏 7】语法篇(六) :Lambda表达式(->) 方法引用(::) stream流
  9. IOS-简单WebView的使用
  10. python描述符与实例属性_Python 中的属性访问与描述符
  11. Android开发学习之路-LruCache使用和源码分析
  12. 查看设置本机共享文件 net share
  13. 物资管理信息系统4 -- 修改密码界面
  14. t14m4t:一款功能强大的自动化暴力破解工具
  15. C语言学习资料----快速排序
  16. 0ops CTF/0CTF writeup
  17. WOai wojiao
  18. 事务的特性——持久性(实现原理)
  19. 数据分析真题日刷 | 网易2018校招数据分析师笔试卷
  20. 大数据分析的PYTHON基础(选择练习)

热门文章

  1. 四大组件之activity生命周期探索
  2. 集成树模型系列之一——随机森林
  3. 如何在Word中制作三线表
  4. C中struct的函数的实现
  5. 遥感场景识别数据集(场景分类)
  6. 过河问题(贪心算法)(python)
  7. 网易用音乐做社交,靠谱吗?
  8. Unity--内置转换矩阵
  9. 【PCB软件技巧】OrCAD与PADS相互搭配使用的相关要点
  10. 论文格式设置-页面设置、页眉页脚、自动生成目录等