爬取汽车之家汽车品牌型号系列数据

需要安装python3，安装，代码开头的几个库，只供学习和参考。如需嫌弃麻烦，请直接下载https://download.csdn.net/download/weixin_36691991/11032522

import re
import json
import requests
from lxml import etree
import os
import urllib3.exceptions
import pymysql
import timemain_url = 'https://car.autohome.com.cn/javascript/NewSpecCompare.js'
photo_url = 'https://www.autohome.com.cn/grade/carhtml/'
type_type_url = "https://car.autohome.com.cn/duibi/ashx/specComparehandler.ashx?callback=jsonpCallback&type=1&seriesid="
http = urllib3.PoolManager()
html = requests.get(main_url).text
data = re.findall(r'=(.*?);', html, re.S)[0]
dir_string = '/file/'
folder = os.getcwd() + dir_string
if not os.path.exists(folder):res = os.makedirs(folder, mode=0o777)
with open(folder+"data.json", 'w',encoding='utf-8') as f:f.write(data)
with open(folder+"data.json", 'r',encoding='utf-8') as f:datas = json.loads(f.read())for data in datas:brands = {}brands['name'] = data['N']brands['ini'] = data['L']# 获取图片链接url = photo_url + brands['ini'] + "_photo.html"html = requests.get(url).textselecter = etree.HTML(html)imgs = selecter.xpath('//dl/dt/a/img/@src')titles = selecter.xpath('//dl/dt/div/a/text()')for title, img in zip(titles, imgs):if title == data['N']:brands['img'] = img.strip('//')types=[]for tss in data['List']:for t in tss['List']:ts={}ts['name'] = t['N']ts['seriesid'] = t['I']print(t['N'])'''获取分类下的分类'''type_url = type_type_url+str(t['I'])type_json = requests.get(type_url).texttype_json = re.findall(r'\({(.*?)}\)', type_json, re.S)[0]json_file = t['N'].replace('/','')with open(folder +json_file+".json", 'w+', encoding='utf-8') as f:f.write("{"+type_json+"}")with open(folder + json_file+".json", 'r', encoding='utf-8') as f:datas = json.loads(f.read())sl = []for ty_j in datas['List']:for key,value in ty_j.items():if type(value)==list:for v in value:sl.append(v['N'])ts['sl']=sltypes.append(ts)brands['type']=types"""创建文件夹"""dir_string = '/file/brand'folder1 = os.getcwd() + dir_stringif not os.path.exists(folder1):res = os.makedirs(folder1, mode=0o777)"""下载图片"""heades = {"User-Agent": "Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 71.0.3578.98Safari / 537.36"}try:try:req = http.request('GET', brands['img'], headers=heades)res = req.datafile_name = folder1 + "/" + brands['name'] + ".png"with open(file_name, 'wb') as f:f.write(res)brands['img'] = file_nametime.sleep(1)except urllib3.exceptions.LocationParseError as e:brands['img'] = ""print(e)except KeyError as e:brands['img']=''"""数据入库"""conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='root', db='weiqing', charset='utf8')cursor = conn.cursor()print((brands['name'],brands['ini'],brands['img']));cursor.execute("insert into brand(name,ini,img)values(%s,%s,%s)",(brands['name'],brands['ini'],brands['img']))b_pid = cursor.lastrowidfor m_t in brands['type']:print((b_pid,m_t['name']))cursor.execute("insert into type(b_id,name)values(%s,%s)",(b_pid,m_t['name']))t_pid = cursor.lastrowidtry:for m_s in m_t['sl']:print((t_pid, m_s))cursor.execute("insert into slis(t_id,name)values(%s,%s)",(t_pid, m_s))except KeyError as e:print(e)cursor.execute("insert into slis(t_id,name)values(%s,%s)", (t_pid, ""))conn.commit()cursor.close()conn.close()print(brands['name']+"===="+brands['ini']+"======"+brands['img'])
exit()

爬取汽车之家汽车品牌型号系列数据相关推荐

爬取了 31502 条北京自如租房数据，看看是否居者有其屋？
作者 | 小狮子是LEO 责编 | 郭芮自如友家作为北京租房的主要途径之一,租房数据都展示在官方网站之上,价格等房屋信息与网站数据一致,数据可信度较高.格式规整.因而选取自如友家官方网站作为租房数据 ...
python爬虫（二十四）爬取汽车之家某品牌图片
爬取汽车之家某品牌图片需求爬取汽车之家某品牌的汽车图片目标url https://car.autohome.com.cn/photolist/series/52880/6957393.html# ...
汽车之家汽车品牌Logo信息抓取 DotnetSpider实战[三]
一.正题前的唠叨第一篇实战博客,阅读量1000+,第二篇,阅读量200+,两篇文章相差近5倍,这个差异真的令我很费劲,截止今天,我一直在思考为什么会有这么大的差距,是因为干货变少了,还是什么原因,一 ...
python3 selenium webdriver.Chrome php 爬取汽车之家所有车型详情数据[开源版]
介绍本接口是车型库api的补充,用于爬取汽车之家所有车型详情数据开源地址:https://gitee.com/web/CarApi/tree/master/python 软件架构 python3 ...
python+mitmdump爬取汽车之家汽车信息
一,准备工作 (一)安装mitmproxy 1,通过 pip install mitmproxy 安装 2,通过安装包安装网址:https://github.com/mitmproxy/mitmpr ...
爬取全国各地区汽车销量情况并用中国地图可视化展示
爬取全国各地区汽车销量情况并用中国地图可视化展示项目介绍网页详情代码爬取数据代码将爬取的数据保存到文档中中国地图可视化运行效果项目介绍爬取2017年全国各省份的汽车销量情况(由于数据 ...
Python爬虫框架Scrapy入门（三）爬虫实战：爬取长沙链家二手房
Item Pipeline介绍 Item对象是一个简单的容器,用于收集抓取到的数据,其提供了类似于字典(dictionary-like)的API,并具有用于声明可用字段的简单语法. Scrapy的It ...
用Python爬取2020链家杭州二手房数据
起源于数据挖掘课程设计的需求,参考着17年这位老兄写的代码:https://blog.csdn.net/sinat_36772813/article/details/73497956?utm_medi ...
python多线程爬取淘宝商家图片
此次爬取淘宝商家图片是为了对相关行业(此处以激光雷达为例)的产品结构以及外观设计进行对比.而淘宝的反爬机制非常强大,能力有限只能有简单点的办法进行爬取.由于淘宝的每一页数据都是存放在js文件里面,所以 ...
爬取3w条『各种品牌』笔记本电脑数据，统计分析并进行可视化展示！真好看~...
本文代码讲解已录成视频,欢迎扫码学习! 本文手撕代码过程 01 前言在上一篇文章[教你用python爬取『京东』商品数据,原来这么简单!]教大家如何学会爬取『京东』商城商品数据. 今天教大家如何爬取 ...

爬取汽车之家汽车品牌型号系列数据

爬取汽车之家汽车品牌型号系列数据相关推荐

最新文章

热门文章