一、开头

汽车之家配置参数抓取最难的部分是部分页面用JS生成的,导致部分文字抓取不出来。而且该网站会经常改动混淆方式,用正则表达式处理费时费力不说,而且会经常需要改动。因此选择用JS解析器来处理。为了方便,这里选择用PyV8来处理。关键的样式拿出来了,后面都好说。先看结果。

.hs_kw0_baikeYA::before { content:"环保" }
.hs_kw1_baikeYA::before { content:"适" }
.hs_kw2_baikeYA::before { content:"摄像头" }
.hs_kw3_baikeYA::before { content:"离地间隙" }
.hs_kw4_baikeYA::before { content:"油箱" }
.hs_kw5_baikeYA::before { content:"后桥" }
.hs_kw6_baikeYA::before { content:"整备" }
.hs_kw7_baikeYA::before { content:"转速" }
.hs_kw8_baikeYA::before { content:"制动力分配" }
.hs_kw9_baikeYA::before { content:"最大" }
.hs_kw10_baikeYA::before { content:"气门数" }
.hs_kw11_baikeYA::before { content:"车门数" }
.hs_kw12_baikeYA::before { content:"差速锁" }
.hs_kw13_baikeYA::before { content:"加热" }
.hs_kw14_baikeYA::before { content:"前" }
.hs_kw15_baikeYA::before { content:"整体" }
.hs_kw16_baikeYA::before { content:"驻车" }
.hs_kw17_baikeYA::before { content:"后悬架" }
.hs_kw18_baikeYA::before { content:"排量" }
.hs_kw19_baikeYA::before { content:"油耗" }
.hs_kw20_baikeYA::before { content:"供油" }
.hs_kw21_baikeYA::before { content:"配气" }
.hs_kw22_baikeYA::before { content:"前轮距" }
.hs_kw23_baikeYA::before { content:"宽度" }
.hs_kw24_baikeYA::before { content:"成功" }
.hs_kw25_baikeYA::before { content:"综合" }
.hs_kw26_baikeYA::before { content:"天窗" }
.hs_kw27_baikeYA::before { content:"悬架" }
.hs_kw28_baikeYA::before { content:"行车电脑" }
.hs_kw29_baikeYA::before { content:"缸盖" }
.hs_kw30_baikeYA::before { content:"标准" }
.hs_kw31_baikeYA::before { content:"限滑" }
.hs_kw32_baikeYA::before { content:"放倒" }
.hs_kw33_baikeYA::before { content:"前制动器" }
.hs_kw34_baikeYA::before { content:"中央" }
.hs_kw35_baikeYA::before { content:"备胎" }
.hs_kw36_baikeYA::before { content:"电子" }
.hs_kw37_baikeYA::before { content:"功率" }
.hs_kw38_baikeYA::before { content:"合金" }
.hs_kw39_baikeYA::before { content:"排列" }
.hs_kw40_baikeYA::before { content:"调节" }
.hs_kw41_baikeYA::before { content:"风" }
.hs_kw42_baikeYA::before { content:"接口" }
.hs_kw43_baikeYA::before { content:"空气" }
.hs_kw44_baikeYA::before { content:"前悬架" }
.hs_kw45_baikeYA::before { content:"高度" }
.hs_kw46_baikeYA::before { content:"铝" }
.hs_kw47_baikeYA::before { content:"后轮胎" }
.hs_kw48_baikeYA::before { content:"仪表盘" }
.hs_kw49_baikeYA::before { content:"规格" }
.hs_kw50_baikeYA::before { content:"前排" }
.hs_kw51_baikeYA::before { content:"音源" }
.hs_kw52_baikeYA::before { content:"价" }
.hs_kw53_baikeYA::before { content:"轴距" }
.hs_kw54_baikeYA::before { content:"并线" }
.hs_kw55_baikeYA::before { content:"指" }
.hs_kw56_baikeYA::before { content:"蓝牙" }
.hs_kw57_baikeYA::before { content:"扭矩" }
.hs_kw58_baikeYA::before { content:"缸体" }
.hs_kw59_baikeYA::before { content:"长度" }
.hs_kw60_baikeYA::before { content:"氙气" }
.hs_kw61_baikeYA::before { content:"助力" }
.hs_kw62_baikeYA::before { content:"行程" }
.hs_kw63_baikeYA::before { content:"气囊" }
.hs_kw64_baikeYA::before { content:"容量" }
.hs_kw65_baikeYA::before { content:"元" }
.hs_kw66_baikeYA::before { content:"缸径" }
.hs_kw67_baikeYA::before { content:"外接" }
.hs_kw68_baikeYA::before { content:"商" }
.hs_kw69_baikeYA::before { content:"电话" }
.hs_kw70_baikeYA::before { content:"喇叭" }
.hs_kw71_baikeYA::before { content:"后排" }
.hs_kw72_baikeYA::before { content:"支撑" }
.hs_kw73_baikeYA::before { content:"独立" }
.hs_kw74_baikeYA::before { content:"全液晶" }
.hs_kw75_baikeYA::before { content:"燃油" }
.hs_kw76_baikeYA::before { content:"容积" }
.hs_kw77_baikeYA::before { content:"真皮" }
.hs_kw78_baikeYA::before { content:"无钥匙" }
.hs_kw79_baikeYA::before { content:"实测" }
.hs_kw80_baikeYA::before { content:"牵引力控制" }
.hs_kw81_baikeYA::before { content:"前轮胎" }
.hs_kw82_baikeYA::before { content:"座椅移动" }
.hs_kw83_baikeYA::before { content:"预警" }
.hs_kw84_baikeYA::before { content:"影像" }
.hs_kw85_baikeYA::before { content:"儿童座椅" }
.hs_kw86_baikeYA::before { content:"机构" }
.hs_kw87_baikeYA::before { content:"进气" }
.hs_kw88_baikeYA::before { content:"名称" }
.hs_kw89_baikeYA::before { content:"扬声器" }
.hs_kw90_baikeYA::before { content:"视频" }
.hs_kw91_baikeYA::before { content:"质保" }
.hs_kw92_baikeYA::before { content:"气缸" }
.hs_kw93_baikeYA::before { content:"驾驶" }
.hs_kw94_baikeYA::before { content:"前桥" }
.hs_kw95_baikeYA::before { content:"质量" }
.hs_kw96_baikeYA::before { content:"主动" }
.hs_kw97_baikeYA::before { content:"电池" }
.hs_kw98_baikeYA::before { content:"稳定" }
.hs_kw99_baikeYA::before { content:"材质" }
.hs_kw100_baikeYA::before { content:"后制动器" }
.hs_kw101_baikeYA::before { content:"压缩比" }
.hs_kw102_baikeYA::before { content:"单碟" }
.hs_kw103_baikeYA::before { content:"差速器" }
.hs_kw104_baikeYA::before { content:"通风" }
.hs_kw105_baikeYA::before { content:"后轮距" }
.hs_kw106_baikeYA::before { content:"号" }
.hs_kw107_baikeYA::before { content:"导" }
.hs_kw0_configYf::before { content:"后驱" }
.hs_kw1_configYf::before { content:"车门数" }
.hs_kw2_configYf::before { content:"驻车" }
.hs_kw3_configYf::before { content:"后悬架" }
.hs_kw4_configYf::before { content:"多片" }
.hs_kw5_configYf::before { content:"排量" }
.hs_kw6_configYf::before { content:"承载式" }
.hs_kw7_configYf::before { content:"供油" }
.hs_kw8_configYf::before { content:"配气" }
.hs_kw9_configYf::before { content:"综合" }
.hs_kw10_configYf::before { content:"悬架" }
.hs_kw11_configYf::before { content:"多连杆" }
.hs_kw12_configYf::before { content:"中央" }
.hs_kw13_configYf::before { content:"双叉臂式" }
.hs_kw14_configYf::before { content:"备胎" }
.hs_kw15_configYf::before { content:"电子" }
.hs_kw16_configYf::before { content:"功率" }
.hs_kw17_configYf::before { content:"排列" }
.hs_kw18_configYf::before { content:"铝" }
.hs_kw19_configYf::before { content:"轴距" }
.hs_kw20_configYf::before { content:"长度" }
.hs_kw21_configYf::before { content:"助力" }
.hs_kw22_configYf::before { content:"元" }
.hs_kw23_configYf::before { content:"商" }
.hs_kw24_configYf::before { content:"直喷" }
.hs_kw25_configYf::before { content:"独立" }
.hs_kw26_configYf::before { content:"容积" }
.hs_kw27_configYf::before { content:"实测" }
.hs_kw28_configYf::before { content:"气缸" }
.hs_kw29_configYf::before { content:"质量" }
.hs_kw30_configYf::before { content:"后制动器" }
.hs_kw31_configYf::before { content:"涡轮" }
.hs_kw32_configYf::before { content:"差速器" }
.hs_kw33_configYf::before { content:"后轮距" }
.hs_kw34_configYf::before { content:"大型车" }
.hs_kw35_configYf::before { content:"环保" }
.hs_kw36_configYf::before { content:"万" }
.hs_kw37_configYf::before { content:"离地间隙" }
.hs_kw38_configYf::before { content:"油箱" }
.hs_kw39_configYf::before { content:"整备" }
.hs_kw40_configYf::before { content:"转速" }
.hs_kw41_configYf::before { content:"年或" }
.hs_kw42_configYf::before { content:"最大" }
.hs_kw43_configYf::before { content:"气门数" }
.hs_kw44_configYf::before { content:"版" }
.hs_kw45_configYf::before { content:"宝马" }
.hs_kw46_configYf::before { content:"油耗" }
.hs_kw47_configYf::before { content:"前轮距" }
.hs_kw48_configYf::before { content:"宽度" }
.hs_kw49_configYf::before { content:"成功" }
.hs_kw50_configYf::before { content:"缸盖" }
.hs_kw51_configYf::before { content:"标准" }
.hs_kw52_configYf::before { content:"前制动器" }
.hs_kw53_configYf::before { content:"增压" }
.hs_kw54_configYf::before { content:"时间" }
.hs_kw55_configYf::before { content:"前置" }
.hs_kw56_configYf::before { content:"前悬架" }
.hs_kw57_configYf::before { content:"高度" }
.hs_kw58_configYf::before { content:"后轮胎" }
.hs_kw59_configYf::before { content:"规格" }
.hs_kw60_configYf::before { content:"价" }
.hs_kw61_configYf::before { content:"指" }
.hs_kw62_configYf::before { content:"扭矩" }
.hs_kw63_configYf::before { content:"缸体" }
.hs_kw64_configYf::before { content:"欧" }
.hs_kw65_configYf::before { content:"行程" }
.hs_kw66_configYf::before { content:"盘式" }
.hs_kw67_configYf::before { content:"缸径" }
.hs_kw68_configYf::before { content:"华" }
.hs_kw69_configYf::before { content:"燃油" }
.hs_kw70_configYf::before { content:"前轮胎" }
.hs_kw71_configYf::before { content:"进口" }
.hs_kw72_configYf::before { content:"机构" }
.hs_kw73_configYf::before { content:"进气" }
.hs_kw74_configYf::before { content:"离合器" }
.hs_kw75_configYf::before { content:"名称" }
.hs_kw76_configYf::before { content:"质保" }
.hs_kw77_configYf::before { content:"压缩比" }
.hs_kw78_configYf::before { content:"通风" }
.hs_kw79_configYf::before { content:"号" }
.hs_kw80_configYf::before { content:"导" }
.hs_kw0_optionsy::before { content:"适" }
.hs_kw1_optionsy::before { content:"摄像头" }
.hs_kw2_optionsy::before { content:"后桥" }
.hs_kw3_optionsy::before { content:"电磁" }
.hs_kw4_optionsy::before { content:"制动力分配" }
.hs_kw5_optionsy::before { content:"差速锁" }
.hs_kw6_optionsy::before { content:"加热" }
.hs_kw7_optionsy::before { content:"前" }
.hs_kw8_optionsy::before { content:"整体" }
.hs_kw9_optionsy::before { content:"驻车" }
.hs_kw10_optionsy::before { content:"成功" }
.hs_kw11_optionsy::before { content:"天窗" }
.hs_kw12_optionsy::before { content:"悬架" }
.hs_kw13_optionsy::before { content:"行车电脑" }
.hs_kw14_optionsy::before { content:"限滑" }
.hs_kw15_optionsy::before { content:"放倒" }
.hs_kw16_optionsy::before { content:"充电" }
.hs_kw17_optionsy::before { content:"中央" }
.hs_kw18_optionsy::before { content:"电子" }
.hs_kw19_optionsy::before { content:"合金" }
.hs_kw20_optionsy::before { content:"调节" }
.hs_kw21_optionsy::before { content:"风" }
.hs_kw22_optionsy::before { content:"接口" }
.hs_kw23_optionsy::before { content:"空气" }
.hs_kw24_optionsy::before { content:"铝" }
.hs_kw25_optionsy::before { content:"高度" }
.hs_kw26_optionsy::before { content:"仪表盘" }
.hs_kw27_optionsy::before { content:"音源" }
.hs_kw28_optionsy::before { content:"并线" }
.hs_kw29_optionsy::before { content:"远光灯" }
.hs_kw30_optionsy::before { content:"蓝牙" }
.hs_kw31_optionsy::before { content:"气囊" }
.hs_kw32_optionsy::before { content:"外接" }
.hs_kw33_optionsy::before { content:"电话" }
.hs_kw34_optionsy::before { content:"升" }
.hs_kw35_optionsy::before { content:"上下" }
.hs_kw36_optionsy::before { content:"喇叭" }
.hs_kw37_optionsy::before { content:"后排" }
.hs_kw38_optionsy::before { content:"支撑" }
.hs_kw39_optionsy::before { content:"华" }
.hs_kw40_optionsy::before { content:"独立" }
.hs_kw41_optionsy::before { content:"全液晶" }
.hs_kw42_optionsy::before { content:"真皮" }
.hs_kw43_optionsy::before { content:"无钥匙" }
.hs_kw44_optionsy::before { content:"牵引力控制" }
.hs_kw45_optionsy::before { content:"前后" }
.hs_kw46_optionsy::before { content:"座椅移动" }
.hs_kw47_optionsy::before { content:"预警" }
.hs_kw48_optionsy::before { content:"影像" }
.hs_kw49_optionsy::before { content:"儿童座椅" }
.hs_kw50_optionsy::before { content:"扬声器" }
.hs_kw51_optionsy::before { content:"视频" }
.hs_kw52_optionsy::before { content:"驾驶" }
.hs_kw53_optionsy::before { content:"前桥" }
.hs_kw54_optionsy::before { content:"主动" }
.hs_kw55_optionsy::before { content:"稳定" }
.hs_kw56_optionsy::before { content:"选装" }
.hs_kw57_optionsy::before { content:"材质" }
.hs_kw58_optionsy::before { content:"单碟" }
.hs_kw59_optionsy::before { content:"差速器" }
.hs_kw60_optionsy::before { content:"通风" }
.hs_kw61_optionsy::before { content:"近光灯" }
.hs_kw62_optionsy::before { content:"导" }

二、环境

1、requests

pip install requests

2、PyV8

pip install PyV8

上面这种安装方式,我在我的windows系统电脑上是没安装成功的,于是去官网看了一下。PyV8只看到了Python2.X的版本,Python3.X的用不了,自己到官网下载,下载地址:http://code.google.com/p/pyv8/downloads/list。我安装的是Python2.7 64位的,因此安装的也是64位的PyV8。

三、解题思路

主要思路是先找到那段压缩的跟缺失文字有关JS,然后找到关键的和添加规则有关的方法,可以通过在里面加入console.log(xxx)来查看控制台的输出辅助找到关键的方法。找到后把这些js直接用PyV8执行会报错,需要自己添加一些代码,修正错误即可。代码如下:

#coding=utf-8
import re
import PyV8
import logging
import requestsdef clscontent(alljs):try:ctx = PyV8.JSContext()ctx.enter()ctx.eval(alljs)return ctx.eval('rules')except:logging.exception('clscontent function exception')return Nonedef makejs(html):try:alljs = ("var rules = '';""var document = {};""document.createElement = function() {""      return {""              sheet: {""                      insertRule: function(rule, i) {""                              if (rules.length == 0) {""                                      rules = rule;""                              } else {""                                      rules = rules + '#' + rule;""                              }""                      }""              }""      }""};""document.querySelectorAll = function() {""      return {};""};""document.head = {};""document.head.appendChild = function() {};""var window = {};""window.decodeURIComponent = decodeURIComponent;")js = re.findall('(\(function\([a-zA-Z]{2}.*?_\).*?\(document\);)', html)for item in js:alljs = alljs + itemreturn alljsexcept:logging.exception('makejs function exception')return Nonedef main(index):try:req = requests.get('https://car.autohome.com.cn/config/series/%d.html' % index)alljs = makejs(req.text)if(alljs == None):print('makejs error')returnresult = clscontent(alljs)if(result == None):print('clscontent error')returnfor item in result.split('#'):print(item)except:logging('main function exception')if __name__ == '__main__':main(153)

四、后话

解这些东西需要较强的JS基本功。本文章仅供学习参考,请勿用于商业用途!

转载于:https://www.cnblogs.com/qiyueliuguang/p/8144248.html

汽车之家配置参数抓取相关推荐

  1. 汽车之家店铺数据抓取 DotnetSpider实战

    一.背景 春节也不能闲着,一直想学一下爬虫怎么玩,网上搜了一大堆,大多都是Python的,大家也比较活跃,文章也比较多,找了一圈,发现园子里面有个大神开发了一个DotNetSpider的开源库,很值得 ...

  2. 汽车之家店铺数据抓取 DotnetSpider实战[一]

    一.背景 春节也不能闲着,一直想学一下爬虫怎么玩,网上搜了一大堆,大多都是Python的,大家也比较活跃,文章也比较多,找了一圈,发现园子里面有个大神开发了一个DotNetSpider的开源库,很值得 ...

  3. 干货!链家二手房数据抓取及内容解析要点

    "本文对链家官网网页进行内容分析,可以作为一般HTTP类应用协议进行协议分析的参考,同时,对链家官网的结构了解后,可以对二手房相关信息进行爬取,并且获取被隐藏的近期成交信息." 另 ...

  4. Python爬虫入门教程石家庄链家租房数据抓取

    1. 写在前面 这篇博客爬取了链家网的租房信息,爬取到的数据在后面的博客中可以作为一些数据分析的素材. 我们需要爬取的网址为:https://sjz.lianjia.com/zufang/ 2. 分析 ...

  5. 爬子第一篇:zol手机型号参数抓取

    目标 爬取url:https://detail.zol.com.cn/cell_phone_advSearch/subcate57_1_s8975_1_1__2.html 数据需求: 抓取主流品牌的所 ...

  6. Python爬虫入门【16】:链家租房数据抓取

    1. 写在前面 作为一个活跃在京津冀地区的开发者,要闲着没事就看看石家庄这个国际化大都市的一些数据,这篇博客爬取了链家网的租房信息,爬取到的数据在后面的博客中可以作为一些数据分析的素材. 我们需要爬取 ...

  7. 【Python爬虫项目】链家房屋信息抓取(超详细适合新手练习附源码)

    爬取链家房屋信息 爬取信息具体如下: 1.标题 2.位置 3.房屋介绍 4.房屋总价 5.房屋单价 一.检查网页源码 搜索标题中的关键字发现目标信息可以在源码中找到,所以我们请求该url网址就可以拿到 ...

  8. python爬虫 : 汽车之家车型最新爬取解密方法

    汽车之家作为一家上市公司,本身具有强大的技术实力支持,简单的数据易爬,复杂而机密的数据,可见不可求. 如下图所示:核心的数据,在页面是找不到文字的,全是空空空空!!!和尚一样! 经过测试发现,这些数据 ...

  9. burpsuite配置证书抓取htpps

    打开代理 直接访问代理服务器localhost:8080 下载CA 然后安装即可

最新文章

  1. python读出文件中的内容_Python读取文本内容
  2. 在linux系统下使用C语言操作临时文件
  3. GridControl详解(三)列数据的格式设置
  4. 1.8(学习笔记)监听器(Listener)
  5. Android安全笔记-Service基本概念
  6. Django 09-2 模型层 字段
  7. 荣耀30s刷鸿蒙,荣耀终于放出大招!4部荣耀旗舰可升级鸿蒙,网友:终于等到了...
  8. matlab 图像分割-自定义函数T_SGM
  9. ACM题目推荐(刘汝佳书上出现的一些题目)
  10. 他 25 岁进贝尔实验室,32 岁提信息论,40 岁办达特茅斯会议,晚年患上阿兹海默 | 人物志...
  11. 用VC++6.0制作简易浏览器(转)
  12. 2023微信手机号筛选器,快速检测出开通微信的号码,检测国外号码过滤微信状态,判断qq是否开通微信软件
  13. 【笔记-node】《imooc-nodejs入门到企业web开发中的应用》
  14. 【ESP32_8266_BT篇(三)】GATTATT协议规范
  15. /dev/null 21 详解
  16. 1019: 火车运行时间
  17. 【立体匹配之一】StereoBM
  18. 中華電信國際漫遊服務一覽表
  19. java 奇数中文乱码_java web 乱码 整理
  20. BBR加速 Centos

热门文章

  1. 吴恩达【神经网络和深度学习】Week2——神经网络基础
  2. 如何判断自己是否适合播音主持,学播音主持的条件
  3. iscsi服务器的搭建
  4. vSAN存储策略概述
  5. Adobe全家桶,设计师福利
  6. 阿里内推实习生电话面试
  7. EMC测试有哪些项?
  8. 计算机术语ap和交换机,AP模式和Router模式区别是什么
  9. AI大模型能带来强人工智能吗 这是值得思考的问题
  10. 那些迷茫的程序员你们需要过来看看!!!!