由于工作原因,需要每隔半小时刷新一些网页,并查看上面的数据是否有更新。这件事能否自动化进行呢?查找了下Java相关的资料,蹦出一个关键词:HttpClient。

HttpClient是常用Http客户端库,相关的资料也不少,只是网上找到的资料好多都是不能用于4.5版的HttpClient,还是需要自己摸索。

在eclipse里新建一个maven工程(maven 3),在pom.xml中做如下设置:

1 <projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
2     <modelVersion>4.0.0</modelVersion>
3     <groupId>test</groupId>
4     <artifactId>admin.test.httpclient</artifactId>
5     <version>0.0.1-SNAPSHOT</version>
6     <dependencies>
7         <dependency>
8             <groupId>org.apache.httpcomponents</groupId>
9             <artifactId>httpclient</artifactId>
10             <version>4.5</version>
11         </dependency>
17     </dependencies>
18     <build>
19         <finalName>MvnTest</finalName>
20         <plugins>
21             <plugin>
22                 <artifactId>maven-compiler-plugin</artifactId>
23                 <version>2.0.2</version>
24                 <configuration>
25                     <source>1.5</source>
26                     <target>1.5</target>
27                 </configuration>
28             </plugin>
29         </plugins>
30     </build>
31 </project>

在pom.xml上运行"maven install"命令完成之后,在“Maven Dependencies”下有了四个jar包:

拿某个知名网站发送GET请求做测试,看看效果:

1 public class HttpClientTest {2 private static String HOST = "www.sina.com";3 private static String BASE_URL = "http://"+HOST+"/";4 public static void main(String[] args) throws  ClientProtocolException, IOException5 CloseableHttpClient httpClient = HttpClients.createDefault();6 /// 设置GET请求参数,URL一定要以"http://"开头7 HttpGet getReq = new HttpGet(BASE_URL);8 /// 设置请求报头,模拟Chrome浏览器9 getReq.addHeader("Accept", "application/json, text/javascript, */*; q=0.01");10 getReq.addHeader("Accept-Encoding", "gzip,deflate,sdch");11 getReq.addHeader("Accept-Language", "zh-CN,zh;q=0.8");12 getReq.addHeader("Content-Type", "text/html; charset=UTF-8");13 getReq.addHeader("Host", HOST);14 getReq.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 5.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36");15 /// 发送GET请求16 CloseableHttpResponse rep = httpClient.execute(getReq);17 /// 从HTTP响应中取出页面内容18 HttpEntity repEntity = rep.getEntity();19 String content = EntityUtils.toString(repEntity);20 /// 打印出页面的内容:21 System.out.println(content);22 /// 关闭连接23 rep.close();24 httpClient.close();25 }26 }

得到的页面内容:

1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2 <!--[5019,2,1] published at 2015-01-01 14:46:19 from #110 by 22-->
3 <htmlxmlns="http://www.w3.org/1999/xhtml">
4 <head>
5 <metahttp-equiv="Content-Type"content="text/html; charset=utf-8" />
6 <title>WWW.SINA.COM</title>
7 <metaname="keywords"content="sina, ??°???" />
8 <metaname="description"content="??°???é??é??" />
9
10 <styletype="text/css">
11 <!--12 /*basic setup*/
13 body, div, dl, dt, dd, ul, ol, li, h1, h2, h3, h4, h5, h6, form, fieldset, legend, input, textarea, p, blockquote, th, td{margin:0;padding:0;}
14 body{background:#ebebed url(http://ui.sina.com/assets/img/www/bg_gradient.gif) repeat-x;font-family:Arial, Helvetica, sans-serif;min-height:100%;}
15 img{border:0;}
16 em{position:absolute;left:-9999em;}
17 .clearDiv{clear:both;}
18 #wrap{padding:50px 0 10px;margin:0 auto;width:775px}
19
20 /*Header*/
21 #header{position:relative;margin:0 auto;width:775px;border-bottom:1px solid #ffa600;}
22 #header h1{float:left;margin:0;width:275px;height:50px;background:url(http://ui.sina.com/assets/img/www/sina_id_www.gif) no-repeat top left;}
23 #header ul{float:left;margin:0;width:500px;height:50px;list-style:none;font-size:12px;color:#333;text-transform:capitalize;}
24 #header ul li{float:right;margin:30px 0 0 0;}
25 #header ul li a{color:#333;text-decoration:none;}
26 #header ul li a:hover{color:#ff9900;text-decoration:none;}
27
28 #map{position:relative;margin:0;width:775px;height:248px;}
29
30 #channel{position:relative;margin:0;width:775px;border-bottom:1px solid #ffa600;}
31
32 /*Footer*/
33 #footer{position:relative;margin:0 auto;width:775px;border-top:1px solid #ffa600;}
34 #footer ul{margin:10px auto;padding:0;width:775px;list-style:none;font-size:12px;color:#333;text-transform:capitalize;text-align:center;}
35 #footer ul li{display:inline;padding:2px 5px;}
36 #footer ul li a{color:#333;text-decoration:none;}
37 #footer ul li a:hover{color:#ff9900;text-decoration:none;}
38
39 /*ads*/
40 #ads{position:relative;margin:5px 0;padding:0;width:775px;}
41 #ads ul{margin:5px 0;width:775px;list-style:none;text-align:center;}
42 #ads ul li.bnr728{margin:5px auto;padding:0;width:775px;height:90px;}
43 #ads ul li.bnr545{float:left;margin:5px auto;padding:0;width:620px;height:80px;}
44 #ads ul li.bnr120{float:left;margin:5px auto;padding:0;width:155px;height:60px;line-height:60px;}
45 #ads ul li.bnr120_2{float:left;margin:5px auto;padding:0;width:155px;height:80px;line-height:80px;}
46
47 -->48 </style>
49
50 <!--swfObject-->
51 <scripttype="text/javascript"src="http://ui.sina.com/assets/js/swfobject.js"></script>
52
53 <!--btn.5-->
54 <scripttype="text/javascript">
55     varflashvars={};56     varparams={};57 params.base= "";58 params.menu= "true";59 params.scale= "noscale";60 params.bgcolor= "#fff";61 params.quality= "best";62     //params.allowfullscreen = "true";
63 params.salign= "c";64 params.wmode= "window";65     varattributes={};66 swfobject.embedSWF("http://ui.sina.com/rm/toyota/091110/toyota_120x60_4_091110.swf","btn5","120","60","9.0.0","expressInstall.swf", flashvars, params, attributes);67
68 </script>
69 <!--END-->
70
71 </head>
72 <body>
73
74 <!--SUDA_CODE_START-->
75 <divstyle='position:absolute;top:0;left:0;width:0;height:0;z-index:1'><divstyle='position:absolute;top:0;left:0;width:1;height:1;'><iframeid='SUDA_FC'src=''width=1height=1SCROLLING=NOFRAMEBORDER=0></iframe></div><divstyle='position:absolute;top:0;left:0;width:0;height:0;visibility:hidden'id='SUDA_CS_DIV'></div></div>
76 <scripttype="text/javascript">
77 //<!--
78 varSSL={Config:{},Space:function(d){varb=d,c=null;b=b.split(".");c=SSL;for(i=0,len=b.length;i<len;i++){c[b[i]]=c[b[i]]||{};c=c[b[i]]}returnc}};SSL.Space("Global");SSL.Space("Core.Dom");SSL.Space("Core.Event");SSL.Space("App");SSL.Global={win:window||{},doc:document,nav:navigator,loc:location};SSL.Core.Dom={get:function(a){returndocument.getElementById(a)}};SSL.Core.Event={on:function(){}};SSL.App={_S_gConType:function(){vara="";try{SSL.Global.doc.body.addBehavior("#default#clientCaps");a=SSL.Global.doc.body.connectionType}catch(b){a="unkown"}returna},_S_gKeyV:function(g,b,d,c){if(g==""){return""}if(c==""){c="="}b=b+c;varf=g.indexOf(b);if(f<0){return""}f=f+b.length;vara=g.indexOf(d,f);if(a<f){a=g.length}returng.substring(f,a)},_S_gUCk:function(a){if((undefined==a)||(""==a)){return""}returnSSL.App._S_gKeyV(SSL.Global.doc.cookie,a,";","")},_S_sUCk:function(e,a,b,d){if(a!=null){if((undefined==d)||(null==d)){d="sina.com.cn"}if((undefined==b)||(null==b)||(""==b)){SSL.Global.doc.cookie=e+"="+a+";domain="+d+";path=/"}else{varc=newDate();varf=c.getTime();f=f+86400000*b;c.setTime(f);f=c.getTime();SSL.Global.doc.cookie=e+"="+a+";domain="+d+";expires="+c.toUTCString()+";path=/"}}},_S_gJVer:function(f,b){vare,a,g,c=1,d=0;if("MSIE"==b){a="MSIE";e=f.indexOf(a);if(e>=0){g=parseInt(f.substring(e+5));if(3<=g){c=1.1;if(4<=g){c=1.3}}}}else{if(("Netscape"==b)||("Opera"==b)||("Mozilla"==b)){c=1.3;a="Netscape6";e=f.indexOf(a);if(e>=0){c=1.5}}}returnc},_S_gFVer:function(nav){varua=SSL.Global.nav.userAgent.toLowerCase();varflash_version=0;if(SSL.Global.nav.plugins&&SSL.Global.nav.plugins.length){varp=SSL.Global.nav.plugins["Shockwave Flash"];if(typeofp=="object"){for(vari=10;i>=3;i--){if(p.description&&p.description.indexOf(" "+i+".")!=-1){flash_version=i;break}}}}else{if(ua.indexOf("msie")!=-1&&ua.indexOf("win")!=-1&&parseInt(SSL.Global.nav.appVersion)>=4&&ua.indexOf("16bit")==-1){for(vari=10;i>=2;i--){try{varobject=eval("new ActiveXObject('ShockwaveFlash.ShockwaveFlash."+i+"');");if(object){flash_version=i;break}}catch(e){}}}else{if(ua.indexOf("webtv/2.5")!=-1){flash_version=3}else{if(ua.indexOf("webtv")!=-1){flash_version=2}}}}returnflash_version},_S_gMeta:function(b,c){vard=SSL.Global.doc.getElementsByName(b);vara=0;if(c>0){a=c}return(d.length>a)?d[a].content:""},_S_gHost:function(b){vara=newRegExp("^http(?:s)?://([^/]+)","im");if(b.match(a)){returnb.match(a)[1].toString()}else{return""}},_S_gTJMTMeta:function(){returnSSL.App._S_gMeta("mediaid")},_S_gTJZTMeta:function(){vara=SSL.App._S_gMeta("subjectid");a.replace(",",".");a.replace(";",",");returna},_S_isFreshMeta:function(){return false},_S_isIFrameSelf:function(b,a){if(SSL.Global.win.top==SSL.Global.win){return false}else{try{if(SSL.Global.doc.body.clientHeight==0){return false}if((SSL.Global.doc.body.clientHeight>=b)&&(SSL.Global.doc.body.clientWidth>=a)){return false}else{return true}}catch(c){return true}}},_S_isHome:function(b){vara="";try{SSL.Global.doc.body.addBehavior("#default#homePage");a=SSL.Global.doc.body.isHomePage(b)?"Y":"N"}catch(c){a="unkown"}returna}};functionSUDA(I,h,g){varf=SSL.Global,y=SSL.Core.Dom,v=SSL.Core.Event,j=SSL.App;varF="webbug_meta_ref_mod_noiframe_async_fc_:9.12c",k="-9999-0-0-1";varb=f.nav.appName.indexOf("Microsoft Internet Explorer")>-1?"MSIE":f.nav.appName;varu=f.nav.appVersion;varq=f.loc.href.toLowerCase();varz=f.doc.referrer.toLowerCase();varp="";varn="",J="SUP",w="",t="Apache",x="SINAGLOBAL",r="ULV",G="UOR",s="_s_upa",a=320,l=240,H=0,o="",m="",M=0,K=10000,E=0,d="_s_acc";varC=q.indexOf("https")>-1?"https://":"http://",B="beacon.sina.com.cn",D=C+B+"/a.gif",L=C+B+"/e.gif";vare=100,c=2000;varA={_S_gsSID:function(){varN=j._S_gUCk(t);if(""==N){varO=newDate();N=Math.random()*10000000000000+"."+O.getTime();j._S_sUCk(t,N)}returnN},_S_sGID:function(N){if(""!=N){j._S_sUCk(x,N,3650)}},_S_gGID:function(){returnj._S_gUCk(x)},_S_gsGID:function(){varN=j._S_gUCk(x);if(""==N){N=A._S_gsSID();A._S_sGID(N)}returnN},_S_gCid:function(){try{varN=j._S_gMeta("publishid");if(""!=N){varP=N.split(",");if(P.length>0){if(P.length>=3){k="-9999-0-"+P[1]+"-"+P[2]}returnP[0]}}else{return"0"}}catch(O){return"0"}},_S_gAEC:function(){returnj._S_gUCk(d)},_S_sAEC:function(N){if(""==N){return}varO=A._S_gAEC();if(O.indexOf(N+",")<0){O=O+N+","}j._S_sUCk(d,O,7)},_S_p2Bcn:function(R,Q){varP=newDate();varO=Q+"?"+R+"&gUid_"+P.getTime();varN=newImage();SUDA.img=N;N.src=O},_S_gSUP:function(){if(w!=""){returnw}varP=unescape(j._S_gUCk(J));if(P!=""){varO=j._S_gKeyV(P,"ag","&","");varN=j._S_gKeyV(P,"user","&","");varQ=j._S_gKeyV(P,"uid","&","");varS=j._S_gKeyV(P,"sex","&","");varR=j._S_gKeyV(P,"dob","&","");w=O+":"+N+":"+Q+":"+S+":"+R;returnw}else{return""}},_S_gsLVisit:function(P){varR=j._S_gUCk(r);varQ=R.split(":");varS="";if(Q.length>=6){if(P!=Q[4]){varO=newDate();varN=newDate(parseInt(Q[0]));Q[1]=parseInt(Q[1])+1;if(O.getMonth()!=N.getMonth()){Q[2]=1}else{Q[2]=parseInt(Q[2])+1}if(((O.getTime()-N.getTime())/86400000)>=7){Q[3]=1}else{if(O.getDay()<N.getDay()){Q[3]=1}else{Q[3]=parseInt(Q[3])+1}}S=Q[0]+":"+Q[1]+":"+Q[2]+":"+Q[3];Q[5]=Q[0];Q[0]=O.getTime();j._S_sUCk(r,Q[0]+":"+Q[1]+":"+Q[2]+":"+Q[3]+":"+P+":"+Q[5],360)}else{S=Q[5]+":"+Q[1]+":"+Q[2]+":"+Q[3]}}else{var O=new Date();S=":1:1:1";j._S_sUCk(r,O.getTime()+S+":"+P+":",360)}return S},_S_gUOR:function(){var N=j._S_gUCk(G);var O=N.split(":");if(O.length>=2){return O[0]}else{return""}},_S_sUOR:function(){var R=j._S_gUCk(G),W="",O="",V="",Q="";var X=/[&|?]c=spr(_[A-Za-z0-9]{1,}){3,}/;var S=new Date();if(q.match(X)){V=q.match(X)[0]}else{if(z.match(X)){V=z.match(X)[0]}}if(V!=""){V=V.substr(3)+":"+S.getTime()}if(R==""){if(j._S_gUCk(r)==""&&j._S_gUCk(r)==""){W=j._S_gHost(z);O=j._S_gHost(q)}j._S_sUCk(G,W+","+O+","+V,365)}else{var T=0,U=R.split(",");if(U.length>=1){W=U[0]}if(U.length>=2){O=U[1]}if(U.length>=3){Q=U[2]}if(V!=""){T=1}else{var P=Q.split(":");if(P.length>=2){var N=new Date(parseInt(P[1]));if(N.getTime()<(S.getTime()-86400000*30)){T=1}}}if(T){j._S_sUCk(G,W+","+O+","+V,365)}}},_S_gRef:function(){var N=/^[^\?&#]*.swf([\?#])?/;if((z=="")||(z.match(N))){var O=j._S_gKeyV(q,"ref","&","");if(O!=""){return O}}return z},_S_MEvent:function(){if(M==0){M++;var O=j._S_gUCk(s);if(O==""){O=0}O++;if(O<K){var N=/[&|?]c=spr(_[A-Za-z0-9]{2,}){3,}/;if(q.match(N)||z.match(N)){O=O+K}}j._S_sUCk(s,O)}},_S_gMET:function(){var N=j._S_gUCk(s);if(N==""){N=0}return N},_S_gCInfo_v2:function(){var N=new Date();return"sz:"+screen.width+"x"+screen.height+"|dp:"+screen.colorDepth+"|ac:"+f.nav.appCodeName+"|an:"+b+"|cpu:"+f.nav.cpuClass+"|pf:"+f.nav.platform+"|jv:"+j._S_gJVer(u,b)+"|ct:"+j._S_gConType()+"|lg:"+f.nav.systemLanguage+"|tz:"+N.getTimezoneOffset()/60+"|fv:"+j._S_gFVer(f.nav)},_S_gPInfo_v2:function(N,O){if((undefined==N)||(""==N)){N=A._S_gCid()+k}return"pid:"+N+"|st:"+A._S_gMET()+"|et:"+E+"|ref:"+escape(O)+"|hp:"+j._S_isHome(q)+"|PGLS:"+j._S_gMeta("stencil")+"|ZT:"+escape(j._S_gTJZTMeta())+"|MT:"+escape(j._S_gTJMTMeta())+"|keys:"},_S_gUInfo_v2:function(N){return"vid:"+N+"|sid:"+A._S_gsSID()+"|lv:"+A._S_gsLVisit(A._S_gsSID())+"|un:"+A._S_gSUP()+"|uo:"+A._S_gUOR()+"|ae:"+A._S_gAEC()},_S_gEXTInfo_v2:function(O,N){o=(undefined==O)?o:O;m=(undefined==N)?m:N;return"ex1:"+o+"|ex2:"+m},_S_pBeacon:function(R,Q,O){try{varT=A._S_gsGID();if(""==T){if(H<1){setTimeout(function(){A._S_pBeacon(R,Q,O)},c);H++;return}else{T=A._S_gsSID();A._S_sGID(T)}}varV="V=2";varS=A._S_gCInfo_v2();varX=A._S_gPInfo_v2(R,A._S_gRef());varP=A._S_gUInfo_v2(T);varN=A._S_gEXTInfo_v2(Q,O);varW=V+"&CI="+S+"&PI="+X+"&UI="+P+"&EX="+N;A._S_p2Bcn(W,D)}catch(U){}},_S_acTrack_i:function(N,P){if((""==N)||(undefined==N)){return}A._S_sAEC(N);if(0==P){return}varO="AcTrack||"+A._S_gGID()+"||"+A._S_gsSID()+"||"+A._S_gSUP()+"||"+N+"||";A._S_p2Bcn(O,L)},_S_uaTrack_i:function(P,N){varO="UATrack||"+A._S_gGID()+"||"+A._S_gsSID()+"||"+A._S_gSUP()+"||"+P+"||"+N+"||"+A._S_gRef()+"||";A._S_p2Bcn(O,L)}};if(M==0){if("MSIE"==b){SSL.Global.doc.attachEvent("onclick",A._S_MEvent);SSL.Global.doc.attachEvent("onmousemove",A._S_MEvent);SSL.Global.doc.attachEvent("onscroll",A._S_MEvent)}else{SSL.Global.doc.addEventListener("click",A._S_MEvent,false);SSL.Global.doc.addEventListener("mousemove",A._S_MEvent,false);SSL.Global.doc.addEventListener("scroll",A._S_MEvent,false)}}A._S_sUOR();return{_S_pSt:function(N,P,O){try{if((j._S_isFreshMeta())||(j._S_isIFrameSelf(l,a))){return}++E;A._S_gsSID();setTimeout(function(){A._S_pBeacon(N,P,O,0)},e)}catch(Q){}},_S_pStM:function(N,P,O){++E;A._S_pBeacon(N,((undefined==P)?A._S_upExt1():P),O)},_S_acTrack:function(N,P){try{if((undefined!=N)&&(""!=N)){setTimeout(function(){A._S_acTrack_i(N,P)},e)}}catch(O){}},_S_uaTrack:function(O,N){try{if(undefined==O){O=""}if(undefined==N){N=""}if((""!=O)||(""!=N)){setTimeout(function(){A._S_uaTrack_i(O,N)},e)}}catch(P){}},_S_gCk:function(N){returnj._S_gUCk(N)},_S_sCk:function(Q,N,O,P){returnj._S_sUCk(Q,N,O,P)},_S_gGlobalID:function(){returnA._S_gGID()},_S_gSessionID:function(){returnA._S_gsSID()}}}varGB_SUDA;if(GB_SUDA==null){GB_SUDA=newSUDA({})}var_S_PID_="";function_S_pSt(a,c,b){GB_SUDA._S_pSt(a,c,b)}function_S_pStM(a,c,b){GB_SUDA._S_pStM(a,c,b)}function_S_acTrack(a){GB_SUDA._S_acTrack(a,1)}function_S_uaTrack(b,a){GB_SUDA._S_uaTrack(b,a)}(function(){functiona(b,e,d){varc=document.createElement("script");if(typeofe==="string"){c.charset=e}c.onreadystatechange=c.onload=function(){if(!this.readyState||this.readyState=="loaded"||this.readyState=="complete"){if(e&&typeofe==="function"){e()}if(d&&typeofd==="function"){d()}c.onreadystatechange=c.onload=null;c.parentNode.removeChild(c)}};c.src=b;document.getElementsByTagName("head")[0].appendChild(c)}a("http://d3.sina.com.cn/shh/ws/2012/xb/gladnews_run.js")})();79 //-->
80 </script>
81 <scripttype="text/javascript">
82 //<!--
83 GB_SUDA._S_pSt("");84 //-->
85 </script>
86 <noScript>
87 <divstyle='position:absolute;top:0;left:0;width:0;height:0;visibility:hidden'><imgwidth=0height=0src='http://beacon.sina.com.cn/a.gif?noScript'border='0'alt=''/></div>
88 </noScript>
89 <!--SUDA_CODE_END-->
90
91 <divid="wrap">
92     <!--Header-->
93     <divid="header">
94         <h1><em>??°???????????±???é???§?</em></h1>
95         <ul>
96         <li><ahref="http://english.sina.com/index.html"onclick="_S_uaTrack('global_guide', 'english');">Sina English</a></li>
97         </ul>
98         <divclass="clearDiv"></div>
99     </div>
100
101     <!--Map-->
102     <divid="map">
103         <imgsrc="http://ui.sina.com/assets/img/www/worldmap.jpg"alt=""name="map1"width="775"height="248"border="0"usemap="#Map1"id="Map1" />
104
105 <mapname="Map1"id="">
106 <areashape="rect"coords="173,81,299,137"href="http://home.sina.com"target="_self"alt="????????°???"title="????????°???"onclick="_S_uaTrack('global_guide', 'us');" />
107 <areashape="rect"coords="468,81,572,129"href="http://www.sina.com.cn"target="_self"alt="????????°???"title="????????°???"onclick="_S_uaTrack('global_guide', 'beijing');" />
108 <areashape="rect"coords="482,145,578,184"href="http://www.sina.com.hk"target="_self"alt="é???????°???"title="é???????°???"onclick="_S_uaTrack('global_guide', 'hongkong');" />
109 <areashape="rect"coords="658,123,755,162"href="http://www.sina.com.tw"target="_self"alt="??°?????°???"title="??°?????°???"onclick="_S_uaTrack('global_guide', 'taipei');" />
110 </map>
111     </div>
112
113     <!--Channels-->
114     <divid="channel">
115         <imgsrc="http://ui.sina.com/assets/img/www/categories-120918.gif"alt=""width="775"height="44"border="0"usemap="#Map4"id="Map4" />
116
117 <mapname="Map4"id="">
118 <areashape="rect"target="_self"alt="??????"coords="4,3,76,35"href="http://us.weibo.com"onclick="_S_uaTrack('global_guide', 'weibo');" />
119 <areashape="rect"target="_self"alt="??????"coords="95,3,166,37"href="http://google.sina.com/"onclick="_S_uaTrack('global_guide', 'search');" />
120 <areashape="rect"target="_self"alt="è??é??"coords="171,2,241,38"href="http://video.sina.com"onclick="_S_uaTrack('global_guide', 'video');" />
121 <areashape="rect"target="_self"alt="??¤???"coords="257,3,328,39"href="http://match.sina.com/"onclick="_S_uaTrack('global_guide', 'match');" />
122 <areashape="rect"target="_self"alt="???é??"coords="432,3,496,36"href="http://travel.sina.com/"onclick="_S_uaTrack('global_guide', 'travel');" />
123 <areashape="rect"target="_self"alt="é??é??"coords="509,2,582,35"href="http://yp.sina.com/"onclick="_S_uaTrack('global_guide', 'yellow');" />
124 <areashape="rect"target="_self"alt="?????????"coords="590,2,679,33"href="http://sina.echineselearning.com/"onclick="_S_uaTrack('global_guide', 'chinese');" />
125 <areashape="rect"target="_self"alt="è?????"coords="335,3,417,38"href="http://bbs.sina.com/"onclick="_S_uaTrack('global_guide', 'bbs');" />
126 <areashape="rect"target="_self"alt="??????"coords="688,1,772,35"href="http://deals.sina.com"onclick="_S_uaTrack('global_guide', 'deals');" />
127 </map>
128     </div>
129
130     <!--ads (banners/buttons)-->
131     <divid="ads">
132         <ul>
133             <liclass="bnr728"><!--Row 1 . 728x90-->
134 <scripttype="text/javascript">
135 //<![CDATA[
136 ord=window.ord||Math.floor(Math.random()*1E16);137 document.write('<script type="text/javascript" src="http://ad.doubleclick.net/adj/us.homepage/;pos=top;sz=728x90;ord=' +ord+ '?"><\/script>');138 //]]>
139 </script>
140 <noscript><ahref="http://ad.doubleclick.net/jump/us.homepage/;pos=top;sz=728x90;ord=123456789?"target="_blank" ><imgsrc="http://ad.doubleclick.net/ad/us.homepage/;pos=top;sz=728x90;ord=123456789?"border="0"alt="" /></a></noscript>
141 <!--END . Row 1 . 728x90-->
142
143 </li>
144
145             <liclass="bnr120"><scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/www/120_60/2.js"></script></li>
146             <liclass="bnr120"><scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/www/120_60/3.js"></script></li>
147             <liclass="bnr120"><scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/www/120_60/4.js"></script></li>
148             <liclass="bnr120"><scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/www/120_60/5.js"></script></li>
149             <liclass="bnr120"><scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/www/120_60/6.js"></script></li>
150
151             <liclass="bnr120"><scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/www/120_60/7.js"></script></li>
152             <liclass="bnr120"><scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/www/120_60/8.js"></script></li>
153             <liclass="bnr120"><scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/www/120_60/9.js"></script></li>
154             <liclass="bnr120"><scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/www/120_60/10.js"></script></li>
155             <liclass="bnr120"><scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/www/120_60/11.js"></script></li>
156
157         </ul>
158
159         <divclass="clearDiv"></div>
160     </div>
161     <!--END . ads-->
162
163     <!--Footer-->
164     <divid="footer">
165         <ul>
166         <li><ahref="http://corp.sina.com.cn/eng/">About SINA</a></li>
167         <li>|</li>
168         <li><ahref="http://corp.sina.com.cn/eng/sina_rela_eng.htm">Investor</a></li>
169         <li>|</li>
170         <li><ahref="http://mediakit.sina.com/">Media Kit</a></li>
171         <li>|</li>
172         <li><ahref="http://mediakit.sina.com/contact.html">Comments or Question?</a></li>
173         <br/><br/>
174         <liclass="copyright">Copyright &copy; 1996-2015 SINA Corporation, All Rights Reserved</li>
175         </ul>
176     </div>
177
178 </div>
179
180 <!--floating video-->
181 <divid="flvideo">
182 <scripttype="text/javascript"src="http://dailynews.sina.com/gb/ads/common/floatingvideo.js"></script>
183 </div>
184
185 <!--START Nielsen Online SiteCensus V6.0-->
186 <scripttype="text/javascript"src="//secure-us.imrworldwide.com/v60.js"></script>
187 <scripttype="text/javascript">
188 varpvar={ cid:"us-sina", content:"0", server:"secure-us"};189 varfeat={ surveys_enabled:1, sample_rate:0.1};190 vartrac=nol_t(pvar, feat);191 trac.record().post().do_sample();192 </script>
193 <noscript>
194 <div>
195 <imgsrc="//secure-us.imrworldwide.com/cgi-bin/m?ci=us-sina&amp;cg=0&amp;cc=1&amp;ts=noscript"width="1"height="1"alt="" />
196 </div>
197 </noscript>
198 <!--END Nielsen Online SiteCensus V6.0-->
199
200 </body>
201 </html>

HTML Code

OK,这样就可以抓到网站主页的数据了。现在的HttpClient对于gzip格式的响应解析做得很好,在内部就解压缩了,不需要使用者做特殊处理。

后续还需要做一个桌面的应用,能够隔几分钟轮询页面,并将所需部分内容是否更新的状态通知给用户的功能。

转载于:https://www.cnblogs.com/dsdk2008/p/4745243.html

HttpClient v4.5 简单抓取主页数据相关推荐

  1. 知乎爬虫之4:抓取页面数据

    git爬虫项目地址( 终于上传代码了~~~~关注和star在哪里):https://github.com/MatrixSeven/ZhihuSpider(已完结) 附赠之前爬取的数据一份(mysql) ...

  2. java 爬数据工具 知乎_知乎爬虫之4:抓取页面数据

    本文由博主原创,转载请注明出处:知乎爬虫之4:抓取页面数据 咱们上一篇分析了知乎的登陆请求和如何拿到粉丝/关注的请求,那么咱们这篇就来研究下如何拿利用Jsoup到咱们想要的数据. 那么咱们说下,首先请 ...

  3. 爬虫软件尝试-后羿采集器:批量免费抓取网页数据

    免费软件尝试-后裔采集器 软件特点:免费,全平台(Windows. Mac. Linux),操作简单无需技术. 使用流程: 下载软件打开->输入抓取数据的网址->职能采集->等待自动 ...

  4. python推特爬虫_Tweepy1_抓取Twitter数据

    之前一直想用爬虫登陆并抓取twitter数据,试过scrapy,requests等包,都没成功,可能是我还不太熟悉的原因,不过 今天发现了一个新包tweepy,专门用于在Python中处理twitte ...

  5. Python 爬虫篇#笔记02# | 网页请求原理 和 抓取网页数据

    目录 一. 网页请求原理 1.1 浏览网页的过程 1.2 统一资源定位符URL 1.3 计算机域名系统DNS 1.4 分析浏览器显示完整网页的过程 1.5 客户端THHP请求格式 1.6 服务端HTT ...

  6. 四、小程序|App抓包(四)-Tcpdump抓取手机数据包分析

    小程序|App抓包(四) Tcpdump抓取手机数据包分析 一.环境需求: 1.手机需要root 2.电脑上安装SDK(建议安装android studio)也可单独安装SDk也行 下载地址 : ht ...

  7. java爬取验证码图片_JAVA HttpClient实现页面信息抓取(获取图片验证码并传入cookie实现信息获取)...

    JAVA HttpClient实现页面信息抓取(获取图片验证码并传入cookie实现信息获取) 发布时间:2018-05-18 16:41, 浏览次数:632 , 标签: JAVA HttpClien ...

  8. cookie追加数据_集算器 SPL 抓取网页数据

    [摘要] 集算器 SPL 支持抓取网页数据,根据抓取定义规则,可将网页数据下载到在本地进行统计分析.具体定义规则要求.使用详细情况,请前往乾学院:集算器 SPL 抓取网页数据! 网站上的数据源是我们进 ...

  9. 爬虫抓取页面数据原理(php爬虫框架有很多 )

    爬虫抓取页面数据原理(php爬虫框架有很多 ) 一.总结 1.php爬虫框架有很多,包括很多傻瓜式的软件 2.照以前写过java爬虫的例子来看,真的非常简单,就是一个获取网页数据的类或者方法(这里的话 ...

最新文章

  1. Android应用底部导航栏(选项卡)实例
  2. 微型计算机与维修自测,微机系统及维护第三章自测.doc
  3. [Java] SSH框架笔记_Struts2配置问题
  4. Android 打造异常崩溃捕获工具
  5. pytorch 查看中间变量的梯度
  6. [pytorch、学习] - 4.5 读取和存储
  7. [html] HTML5的文件离线储存怎么使用,工作原理是什么?
  8. 洛谷 P1736 创意吃鱼法
  9. java mysql geo_GEO数据库简介
  10. [ubuntu] duplicated values in $PATH
  11. 使用zabbix监控esxi
  12. grpc客户端调用接口报:too many colons in address 域名配置问题
  13. 手把手教你解决U盘快捷方式病毒
  14. Eclipse显示bin文件夹
  15. Inception-V3论文翻译——中文版
  16. MySQL中的auto_increment
  17. 淘宝一键下架在售商品步骤
  18. 海关数据正确引用方式
  19. DirectX游戏编程入门——前言
  20. Servlet邮件发送

热门文章

  1. 生成高分辨率pdf_用于高分辨率图像合成的生成变分自编码器
  2. 怎样在sqlite3上执行SQL语句
  3. 为什么很多大老板银行贷款几千万,看起来还那么潇洒?
  4. 为什么你总是申请不到大额贷款?
  5. GDIPlus灰度化图像
  6. 项目中使用 java函数式编程_函数式编程在Java8中使用Lambda表达式进行开发
  7. js中的json ajax,js结合json实现ajax简单实例
  8. python采用编译型方式执行_Python程序的执行过程 解释型语言和编译型语言
  9. python visa模块_已经安装了pyvisa仍然报错没有模块
  10. 武侠乂服务器位置在哪,武侠乂手游秘境在哪里 地图秘境宝藏分布位置大全