声明:本文只作学习研究,禁止用于非法用途,否则后果自负,如有侵权,请告知删除,谢谢!

项目场景:

想必做过爬虫的工程师,都接触过中国土地市场网这个网站吧,网上也有很多相关的爬取方式介绍,我看了几篇往年的,发现原网页已经做了反爬的更新,那么这里就再来练手一次吧~

解决方案:

1.我们先请求下网址,映入眼帘的就是让我们输入5位验证码,不过这个验证码也太low了,直接用python里OCR的包pytesseract就可以识别出来,如果是那种复杂的扭曲、粘连的验证码就要做训练集,用机器学习的方法也能解决~
pytesseract安装的方法随便找了个链接,应该可以用。

2.然后我们输入正确的验证码,看下他页面的请求,分析后可以知道,第一个请求是输入验证码,第二个是验证码,第三个是根据输入的5位验证码形成加密参数security_verify_img,来进行对验证码的校验,第四个是校验成功后带cookie重新请求原网页,那么我们要做的就是验证码的识别,加密参数security_verify_img的解密。

3.验证码识别已经说过了,现在就是要找security_verify_img加密的js,我们先从第一个请求返回的数据看看,是一个html页面。
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta http-equiv="Cache-Control" content="no-store, no-cache, must-revalidate, post-check=0, pre-check=0"/><meta http-equiv="Connection" content="Close"/><script type="text/javascript">function EnterJump() {if (event.keyCode == 13) {YunsuoAutoJump();}}function stringToHex(str) { //加密方法var val = "";for (var i = 0; i < str.length; i++) {if (val == "")val = str.charCodeAt(i).toString(16);elseval += str.charCodeAt(i).toString(16);}return val;}function YunsuoAutoJump() {var text = document.getElementById("intext").value;if (text == "") {alert('验证码不能为空');} else {var curlocation = window.location.href;if (-1 == curlocation.indexOf("security_verify_")) {document.cookie = "srcurl=" + stringToHex(window.location.href) + ";path=/;";}self.location = "/default.aspx?tabid=226&security_verify_img=" + stringToHex(text);}}</script><title>网站防火墙</title><style type="text/css">.c1 {margin-left: 70px;_margin-left: 1px;}body {text-align: center;}</style></head><body onkeypress="EnterJump()"><br/><br/><div style="margin: 0 auto; width: 515px; height: 200px; border: 2px solid #134f7c; background: #e7eff2; font-family:微软雅黑"><div style="text-align: center; color: white; height: 30px; font-size: 14px; background: #074773; line-height:30px">网站访问认证页面</div><br/><br/><div style="text-align: center; font-size: 14px;"><table class="c1"><tr><td style="width: 168px; font-size: 14px;">请输入验证码后继续访问:</td><td style="width: 30px; text-align: left;"><input id="intext" type="text" style="width: 100px; height: 28px;"/></td><td style="letter-spacing: 4; text-align: center; float: left; margin-left: 8px;"><img class="verifyimg" alt="verify_img" src="data:image/bmp;base64,Qk3aHwAAAAAAADYAAAAoAAAAZAAAABsAAAABABgAAAAAAKQfAAAAAQAAAAEAAAAAAAAAAAAA3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm0tLS5ubm3Nzc0tLS5ubm3Nzcpqymtry2rrSupqymtry2rrSupqymtry23Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmrrSupqymtry2rrSupqymtry2rrSupqym5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS4ODgrrSupqymtry2rrSuw8TD5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS29vbrrSuubu55ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm1tbWpqymtry2rrSupqym1tbW3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS3Nzc0tLS5ubm3Nzc0tLS5ubmAJkAAJkAAJkAAJkAAJkAAJkAAJkAAJkA5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSAJkAAJkAAJkAAJkAAJkAAJkAAJkAAJkA0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzci5yLAJkAAJkAAJkAAJkAbpRunaed39/f3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzcr7OvJpAmSY1Jury65ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSmaqZAJkAAJkAAJkAAJkAaY9prLas1dbV0tLS5ubm3Nzc0tLS5ubm3Nzc5ubm3Nzc0tLS5ubm3Nzc0tLSO487AJkAFJEUQ49DQIxAPYo9Q49DQIxA0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3NzcNoo2AJkAFZIVPYo9Q49DQIxAPYo9Q49D3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmB5YHPYo9Q49DQIxAPYo9FpMWAJkAury65ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzci5yLAJkAeph60tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3NzcBpYGQ49DQIxAPYo9Q49DFZIVAJkAzM7M3Nzc0tLS5ubm3Nzc0tLS5ubm0tLS5ubm3Nzc0tLS5ubm3NzcwsPCO487MI0wq7Cr5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmy8zLNoo2M48zs7iz0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSXZNd3Nzc0tLS5ubm3NzcdJJ0EZQRgpuC0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmkqOSAJkAf55/3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmWY9Z0tLS5ubm3Nzc0tLSf55/EJMQfJV85ubm3Nzc0tLS5ubm3Nzc0tLS3Nzc0tLS5ubm3Nzc0tLS5ubm3NzctLe0OY85PYw9r7Ov5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmvcC9NIo0P48/t7u30tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3NzcLosuAJkA3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLStry2LY0tSIpIycvJ3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSM48zAJkA0tLS5ubm3Nzc0tLS5ubm3Nzc5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzcqa6pNY81R41HtLe05ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmsbexMIowSpBKvcC90tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmMI0wAJkA5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSf55/AJkAi5yL5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3NzcLosuAJkA3Nzc0tLS5ubm3Nzc0tLS5ubm0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzcn6ifLo8uT41PvL685ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmp6+nKosqU5FTxcfF0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSM48zAJkA0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3NzcdJJ0AJkAkqOS0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmMI0wAJkA5ubm3Nzc0tLS5ubm3Nzc0tLS3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzcl6OXJpAmVY5VxsbG5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmnqqeI40jWZJZz9DP0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3NzcxsbGjKOM3Nzc0tLS5ubmkqOSF5AXAJkA3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmq7KrLosuTJBMw8XD0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS2dnZhp2G0tLS5ubm3Nzci5yLGZIZAJkA0tLS5ubm3Nzc0tLS5ubm3Nzc5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3NzcRYpFAJkAWY9Z0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmSY1JAJkAXZNd3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmw8XDFZAVaZZpZJFkYI1gI5AjAJkAZY5l5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmYZFhAJkAsrqy3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzcury6F5IXZJFkYI1gaZZpIY8hAJkAbpdu3Nzc0tLS5ubm3Nzc0tLS5ubm0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmnqqeG48bAJkA3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSprGmHJAcAJkA5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSzM7MAJkAAJkAAJkAAJkAAJkAaJZo0dLR0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSZpVmAJkAo6qj5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmw8XDAJkAAJkAAJkAAJkAAJkAY5FjyMjI5ubm3Nzc0tLS5ubm3Nzc0tLS3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmMI0wAJkA5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSM48zAJkA0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzcury6AJkALY0tg5iDj6WPiZ+JwMHA5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3NzcpqymL48vQIxAw8TD5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSzM7MAJkAK4srj6WPiZ+Jg5iD0tPS3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSM48zAJkA0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3NzcLosuAJkA3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmw8XDAJkATJBM3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3NzcRYpFAJkAw8XD0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzcury6AJkASY1J0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm0tLS5ubm3Nzc0tLS5ubm3NzcTYpN5ubm3Nzc0tLS5ubmdJZ0DZMNAJkA3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmUY5R0tLS5ubm3Nzc0tLSeZt5DpQOAJkA5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSzM7MAJkARYpF5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmSY1JAJkAzM7M3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmw8XDAJkATJBM3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS3Nzc0tLS5ubm3Nzc0tLS5ubmBpYGNIo0OY85N403NIo0E5MTAJkAi5yL5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSBpcGN403NIo0OY85N403EZIRAJkAkqOS0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzcury6AJkAEpISNIo0OY85N403NIo00tPS3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmN403NIo0OY85N403NIo0E5MTAJkALosu5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSzM7MAJkAEZIROY85N403NIo0OY85ycrJ0tLS5ubm3Nzc0tLS5ubm3Nzc5ubm3Nzc0tLS5ubm3Nzc0tLSoa+hAJkAAJkAAJkAAJkAAJkAjaSN1tbW0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzck6GTAJkAAJkAAJkAAJkAAJkAgZeB4ODg3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubmw8XDAJkAAJkAAJkAAJkAAJkAAJkAury65ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLSAJkAAJkAAJkAAJkAAJkAAJkAAJkAAJkA0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzcury6AJkAAJkAAJkAAJkAAJkAAJkAzM7M3Nzc0tLS5ubm3Nzc0tLS5ubm0tLS5ubm3Nzc0tLS5ubm3Nzczs7Ov8O/t7u3r7Ovv8O/t7u3ysrK5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm19fXr7Ovv8O/t7u3r7Ovv8O/09TT0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS4eHht7u3r7Ovv8O/t7u3r7Ovv8O/19fX0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzcr7Ovv8O/t7u3r7Ovv8O/t7u3r7Ovv8O/3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm19fXr7Ovv8O/t7u3r7Ovv8O/t7u3zs7O5ubm3Nzc0tLS5ubm3Nzc0tLS3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS5ubm3Nzc0tLS"/></td></tr><tr><td colspan="3"><input type="submit" value="点击继续访问网站" onclick='YunsuoAutoJump()' style="height: 30px; width: 158px; margin-top: 20px; margin-left: 100px;"/></td></tr></table></div></div></body>
</html>
4.分析后可以得到下图的解释,他的加密方式就是将5位字符验证码转换成十六进制。

5.然后我们整理一下,下面贴出完整的代码~
# -*- coding: utf-8 -*-
import base64,re
import requests
import pytesseract
from PIL import Image
import execjs
import binascii
'''
http://www.landchina.com/default.aspx?tabid=226
'''def str_to_hexStr(string): #字符串转十六进制str_bin = string.encode('utf-8')return binascii.hexlify(str_bin).decode('utf-8')'''加密js--字符串转十六进制'''
js_str = '''function stringToHex(str) {var val = "";for (var i = 0; i < str.length; i++) {if (val == "") val = str.charCodeAt(i).toString(16); else val += str.charCodeAt(i).toString(16);}return val;}
'''
js = execjs.compile(js_str)headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3','Accept-Encoding': 'gzip, deflate','Accept-Language': 'zh-CN,zh;q=0.9',
}url = 'http://www.landchina.com/default.aspx?tabid=226'
session = requests.session()
yzm_info = session.get(url=url, headers=headers, verify=False)#获取图片base64编码,转为图片
base64_data = re.compile('src="https://img-blog.csdnimg.cn/2022010618230477399.bmp"').findall(yzm_info.text)[0]
ori_image_data = base64.b64decode(base64_data)
fout = open('yzm.jpg', 'wb')
fout.write(ori_image_data)
fout.close()#识别简单的验证码
image = Image.open('yzm.jpg')
result = pytesseract.image_to_string(image).replace(' ','').replace('\n','')
print('识别的验证码:',result)#参数加密
data = js.call('stringToHex',result)
print('得到的加密参数:',data)new_url = 'http://www.landchina.com/'+"default.aspx?tabid=226&security_verify_img=" + data
print('验证码检验地址:',new_url)#验证码检验
session.get(new_url, headers=headers, verify=False)#重新请求原网页
response = session.get(url, headers=headers, verify=False)
provinces = re.compile('</td><td class="queryCellBordy">(.*?)</td>').findall(response.text)
print(provinces)# 请求下一页地址,__VIEWSTATE和__EVENTVALIDATION参数是类目里固定的参数,TAB_QuerySubmitPagerData是页数
data = {'__VIEWSTATE': '/wEPDwUJNjkzNzgyNTU4D2QWAmYPZBYIZg9kFgICAQ9kFgJmDxYCHgdWaXNpYmxlaGQCAQ9kFgICAQ8WAh4Fc3R5bGUFIEJBQ0tHUk9VTkQtQ09MT1I6I2YzZjVmNztDT0xPUjo7ZAICD2QWAgIBD2QWAmYPZBYCZg9kFgJmD2QWBGYPZBYCZg9kFgJmD2QWAmYPZBYCZg9kFgJmDxYEHwEFIENPTE9SOiNEM0QzRDM7QkFDS0dST1VORC1DT0xPUjo7HwBoFgJmD2QWAgIBD2QWAmYPDxYCHgRUZXh0ZWRkAgEPZBYCZg9kFgJmD2QWAmYPZBYEZg9kFgJmDxYEHwEFhwFDT0xPUjojRDNEM0QzO0JBQ0tHUk9VTkQtQ09MT1I6O0JBQ0tHUk9VTkQtSU1BR0U6dXJsKGh0dHA6Ly93d3cubGFuZGNoaW5hLmNvbS9Vc2VyL2RlZmF1bHQvVXBsb2FkL3N5c0ZyYW1lSW1nL3hfdGRzY3dfc3lfamhnZ18wMDAuZ2lmKTseBmhlaWdodAUBMxYCZg9kFgICAQ9kFgJmDw8WAh8CZWRkAgIPZBYCZg9kFgJmD2QWAmYPZBYCZg9kFgJmD2QWAmYPZBYEZg9kFgJmDxYEHwEFIENPTE9SOiNEM0QzRDM7QkFDS0dST1VORC1DT0xPUjo7HwBoFgJmD2QWAgIBD2QWAmYPDxYCHwJlZGQCAg9kFgJmD2QWBGYPZBYCZg9kFgJmD2QWAmYPZBYCZg9kFgJmD2QWAmYPFgQfAQUgQ09MT1I6I0QzRDNEMztCQUNLR1JPVU5ELUNPTE9SOjsfAGgWAmYPZBYCAgEPZBYCZg8PFgIfAmVkZAICD2QWBGYPZBYCZg9kFgJmD2QWAmYPZBYCAgEPZBYCZg8WBB8BBYYBQ09MT1I6IzAwMDAwMDtCQUNLR1JPVU5ELUNPTE9SOjtCQUNLR1JPVU5ELUlNQUdFOnVybChodHRwOi8vd3d3LmxhbmRjaGluYS5jb20vVXNlci9kZWZhdWx0L1VwbG9hZC9zeXNGcmFtZUltZy94X3Rkc2N3X3p5X2dkamhfMDEuZ2lmKTsfAwUCNDYWAmYPZBYCAgEPZBYCZg8PFgIfAmVkZAIBD2QWAmYPZBYCZg9kFgJmD2QWAgIBD2QWAmYPFgQfAQUgQ09MT1I6I0QzRDNEMztCQUNLR1JPVU5ELUNPTE9SOjsfA2QWAmYPZBYCAgEPZBYCZg8PFgIfAmVkZAIDD2QWAgIDDxYEHglpbm5lcmh0bWwFiAw8UD48QlI+PC9QPjxUQUJMRT48VEJPRFk+PFRSIGNsYXNzPWZpcnN0Um93PjxURCBzdHlsZT0iQk9SREVSLUJPVFRPTTogMXB4IHNvbGlkOyBCT1JERVItTEVGVDogMXB4IHNvbGlkOyBCT1JERVItVE9QOiAxcHggc29saWQ7IEJPUkRFUi1SSUdIVDogMXB4IHNvbGlkOyBib3JkZXI6MHB4IHNvbGlkIiB2QWxpZ249dG9wIHdpZHRoPTM3MD48UCBzdHlsZT0iVEVYVC1BTElHTjogY2VudGVyIj48QSBocmVmPSJodHRwczovL3d3dy5sYW5kY2hpbmEuY29tLyIgdGFyZ2V0PV9zZWxmPjxJTUcgdGl0bGU9dGRzY3dfbG9nZTEucG5nIGFsdD10ZHNjd19sb2dlMS5wbmcgc3JjPSJodHRwOi8vMjE4LjI0Ni4yMi4xNjYvbmV3bWFuYWdlL3VlZGl0b3IvdXRmOC1uZXQvbmV0L3VwbG9hZC9pbWFnZS8yMDIwMDYxMC82MzcyNzQwNjM0Mjg3NzExMDgxMTExMzEyLnBuZyI+PC9BPjwvUD48L1REPjxURCBzdHlsZT0iQk9SREVSLUJPVFRPTTogMXB4IHNvbGlkOyBCT1JERVItTEVGVDogMXB4IHNvbGlkOyBXT1JELUJSRUFLOiBicmVhay1hbGw7IEJPUkRFUi1UT1A6IDFweCBzb2xpZDsgQk9SREVSLVJJR0hUOiAxcHggc29saWQ7Ym9yZGVyOjBweCBzb2xpZCIgdkFsaWduPXRvcCB3aWR0aD02MjA+PFNQQU4gc3R5bGU9IkZPTlQtRkFNSUxZOiDlrovkvZMsIFNpbVN1bjsgQ09MT1I6IHJnYigyNTUsMjU1LDI1NSk7IEZPTlQtU0laRTogMTJweCI+5Li75Yqe77ya6Ieq54S26LWE5rqQ6YOo5LiN5Yqo5Lqn55m76K6w5Lit5b+D77yI6Ieq54S26LWE5rqQ6YOo5rOV5b6L5LqL5Yqh5Lit5b+D77yJPC9TUEFOPiA8UD48U1BBTiBzdHlsZT0iRk9OVC1GQU1JTFk6IOWui+S9kywgU2ltU3VuOyBDT0xPUjogcmdiKDI1NSwyNTUsMjU1KTsgRk9OVC1TSVpFOiAxMnB4Ij7mjIflr7zljZXkvY3vvJroh6rnhLbotYTmupDpg6joh6rnhLbotYTmupDlvIDlj5HliKnnlKjlj7gmbmJzcDsgJm5ic3A75oqA5pyv5pSv5oyB77ya5rWZ5rGf6Ie75ZaE56eR5oqA6IKh5Lu95pyJ6ZmQ5YWs5Y+4PC9TUEFOPiA8UD48U1BBTiBzdHlsZT0iRk9OVC1GQU1JTFk6IOWui+S9kywgU2ltU3VuOyBDT0xPUjogcmdiKDI1NSwyNTUsMjU1KTsgRk9OVC1TSVpFOiAxMnB4Ij7kuqxJQ1DlpIcxMjAzOTQxNOWPty00Jm5ic3A7ICZuYnNwO+S6rOWFrOe9keWuieWkhzExMDEwMjAyMDA4OTkwJm5ic3A7ICZuYnNwO+mCrueuse+8mmxhbmRjaGluYTIxOEAxNjMuY29tJm5ic3A7Jm5ic3A7PHNjcmlwdCB0eXBlPSJ0ZXh0L2phdmFzY3JpcHQiPnZhciBfYmRobVByb3RvY29sID0gKCgiaHR0cHM6IiA9PSBkb2N1bWVudC5sb2NhdGlvbi5wcm90b2NvbCkgPyAiIGh0dHBzOi8vIiA6ICIgaHR0cHM6Ly8iKTtkb2N1bWVudC53cml0ZSh1bmVzY2FwZSgiJTNDc2NyaXB0IHNyYz0nIiArIF9iZGhtUHJvdG9jb2wgKyAiaG0uYmFpZHUuY29tL2guanMlM0Y4Mzg1Mzg1OWM3MjQ3YzViMDNiNTI3ODk0NjIyZDNmYScgdHlwZT0ndGV4dC9qYXZhc2NyaXB0JyUzRSUzQy9zY3JpcHQlM0UiKSk7PC9zY3JpcHQ+PC9TUEFOPiA8L1A+PC9UUj48L1RCT0RZPjwvVEFCTEU+PFA+Jm5ic3A7PC9QPh8BBWRCQUNLR1JPVU5ELUlNQUdFOnVybChodHRwOi8vd3d3LmxhbmRjaGluYS5jb20vVXNlci9kZWZhdWx0L1VwbG9hZC9zeXNGcmFtZUltZy94X3Rkc2N3MjAxM195d18xLmpwZyk7ZGRPoM15SkYIwI+YGp7WBh8rFm48PdVXWLbsX76YGSzsPA==','__EVENTVALIDATION': '/wEdAAJmxsqFeaX8l/dmLp/wbEpkCeA4P5qp+tM6YGffBqgTjTXxx6PITEWQGvI74e4hgw9pF18pt/2sgOwprHrEnzM3','hidComName': 'default','TAB_QuerySubmitConditionData': '','TAB_QuerySubmitOrderData': '','TAB_RowButtonActionControl': '','TAB_QuerySubmitPagerData': '2','TAB_QuerySubmitSortData': ''
}
response = session.post(url, headers=headers,data=data,verify=False)
provinces = re.compile('</td><td class="queryCellBordy">(.*?)</td>').findall(response.text)
print('下一页数据',provinces)
6.

【2020-09-30】一个适合爬虫练手的网站--中国土地市场网相关推荐

  1. python练手经典100例-推荐几个适合新手练手的Python项目《python入门例子》

    python 为什么实例对象可以调用类方法? 实例是什么例是类定义的实.那么,类中所定义的属方只要没有被屏蔽,在它的实体中就同样是可访问的. 至于说没有run()没有参数self,而是参数cls,为什 ...

  2. 非常适合菜鸟练手的Python项目,墙裂建议收藏!

    [此文章转自乐字节] 最好的编程语言是什么?一千个程序员或许会有一千零一种答案: PHP自然是不会错过这个噱头.C/C++作为元老级的编程语言一直屹立不倒.Java依旧是市场上的香饽饽.当然还有Jav ...

  3. Python 小小爬虫练手,爬取自己的IP

                 Python 小小爬虫练手,爬取自己的IP import re import urllib.request url="http://2020.ip138.com/i ...

  4. 有哪些适合新手练手的前端项目?

    学习编程专栏连载编程新手练手项目系列之前端项目篇,同样也欢迎热爱学习.对Java感兴趣的朋友学习翻阅上三两篇内容.(连载系列的项目整理以后会在每个月进行一次重新整理,也欢迎大家进行项目的投稿,投稿请私 ...

  5. python项目实例初学者-适合初学者练手的 10 个 有趣Python项目

    Python Python开发 Python语言 适合初学者练手的 10 个 有趣Python项目 想成为一个优秀的开发者,没有捷径可走,势必要花费大量时间在键盘后. 而不断地进行各种小项目开发,可以 ...

  6. java技术分享主题_Java开发入门:适合新手练手的Java项目(附源码下载)

    Java作为一门古老的语言,已有20年左右的历史,这在发展日新月异的技术圈可以说是一个神话.虽然不少人曾抱怨Java语言就像老太太的裹脚布,又臭又长,有时写了500行都不能表达程序员的意图. 但从市场 ...

  7. Java——一些适合新手练手的Java项目

    转载自 https://blog.csdn.net/luolianxi/article/details/77924728 Java作为一门古老的语言,已有20年左右的历史,这在发展日新月异的技术圈可以 ...

  8. Java【有哪些适合新手练手的Java项目?】

    Java作为一门古老的语言,已有20年左右的历史,这在发展日新月异的技术圈可以说是一个神话. 虽然不少人曾抱怨Java语言就像老太太的裹脚布,又臭又长,有时写了500行都不能表达程序员的意图. 但从市 ...

  9. 推荐一些适合新手练手的C/C++项目

    最好的编程语言是什么?一千个程序员或许会有一千零一种答案: Python在人工智能时代的风头正盛.Java依旧是市场上的香饽饽.当然还有PHP:JavaScript.C#.Ruby以及Objectiv ...

最新文章

  1. 嵌入式学习笔记之三 (uboot我来看)
  2. IT宅男们,有没有什么技术让你觉得相见恨晚?
  3. oracle tirger_又一次发现Oracle太美之awr相关脚本简介
  4. 实时 Git,在版本控制之前控制源码
  5. [转]ExtJs中使用中碰到的三个问题的解决方法
  6. js动态修改onclick的响应函数后,IE无效的解决方案
  7. J2ME-CLDC/MIDP资源
  8. h3c交换机配置远程管理_H3C交换机配置管理VLAN和配置远程登录
  9. 吉他入门教程之吉他音阶训练——练习方法
  10. 腾讯裁员范围扩大;研究称人类或能喝到月球冰火水;苹果考虑取消京东方3000万块屏幕订单;近视手术无法治愈近视
  11. 案例分析:数据驱动增长的数据虚荣与彷徨
  12. PHPMyWind5.4存储XSS(CVE-2017-12984)
  13. 基于html+css的音乐网站网页设计
  14. E波段通信系统参考文献E-band info(整理)
  15. 2、软件的生命周期软件测试的工作流程
  16. JavaScript写移动端答题网页程序
  17. 1.4 CNN的崛起
  18. Python和numpy下载安装方法
  19. 前端监控sdk 页面性能监控
  20. Redis-Lua语言:简单小巧但功能强大

热门文章

  1. 【渝粤教育】电大中专测量学_1作业 题库
  2. 获取手机唯一识别码IMEI
  3. 独热码状态机、SR锁存器延迟模型、移位除法器模型
  4. w7设置双显示器_win7系统设置双显示器的操作方法
  5. 全球与中国小龙虾市场深度研究分析报告
  6. ab压力测试并发测试基于HTTP
  7. oracle 查看回收站空间,ORACLE 回收站当前状态查询整理
  8. 你身边的博士刚毕业以及稳定下来后的年薪大约是 多少?
  9. U盘安装windows 7 在线视频
  10. javascript练习12:得到输入歌词中朋友出现的次数