一、目标

爬取多页人人车的车辆信息

二、分析

2.1 网站分析

在网页源代码中可以搜索到页面中的数据，所以可以判断该页面为静态加载的

三、完整代码

renrenche.py

import scrapyfrom car.items import RrcItemclass RenrencheSpider(scrapy.Spider):name = 'renrenche'allowed_domains = ['www.renrenche.com']start_urls = ['https://www.renrenche.com/bj/ershouche/?&plog_id=618ab1bbf616cab93022afa088592885']base_url = 'https://www.renrenche.com'def parse(self, response):selector = response.xpath('//ul[contains(@class,"row-fluid list-row js-car-list")]/li/a[not(@rel)]')# print(len(selector))# print(selector)for car in selector:car_name = car.xpath('./h3/text()').extract_first()total_price = car.xpath('./div[contains(@class,"tags-box")]/div/text()').extract_first().replace("\n","").replace(" ","") +"万"down_pay = car.xpath('./div[contains(@class,"tags-box")]/div/div/div/text()').extract_first()car_detail = car.xpath('./@href').extract_first()car_item = RrcItem()car_item['car_name'] = car_namecar_item['car_price'] = total_pricecar_item['down_pay'] = down_payyield car_itemflag = response.xpath('//ul[contains(@class,"pagination js-pagination")][last()]/@class').extract_first()if not flag:url = response.xpath('//ul[contains(@class,"pagination js-pagination")]/li[last()]/a/@href').extract_first()yield scrapy.Request(url=self.base_url+url,callback=self.parse)

pipelines.py

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html# useful for handling different item types with a single interface
import MySQLdb
from itemadapter import ItemAdapterfrom car.spiders.renrenche import RenrencheSpiderclass CarPipeline:def process_item(self, item, spider):return itemclass RrcPipeline:def open_spider(self,spider):conn = MySQLdb.Connect(host='localhost',user='root',password='6666',port=3306,database='maiche',charset='utf8')cursor = conn.cursor()self.conn = connself.cursor = cursordef process_item(self, item, spider):if isinstance(spider,RenrencheSpider):self.cursor.execute("insert into car(carname,totalprice,downpay) values('%s',""'%s','%s');" %(item.get('car_name'),item.get('car_price'),item.get('down_pay')))self.conn.commit()return itemdef close_spider(self,spider):self.conn.close()

四、遇到的坑

1.创建数据库连接时没有加编码格式

Python网络爬虫--项目实战--scrapy爬取人人车相关推荐

python网络爬虫--项目实战--scrapy爬取人人车（5）
一.目标爬取多页人人车的车辆信息二.分析 2.1 网站分析在网页源代码中可以搜索到页面中的数据,所以可以判断该页面为静态加载的三.完整代码 renrenche.py import scrapy ...
Python网络爬虫：利用正则表达式爬取豆瓣电影top250排行前10页电影信息
在学习了几个常用的爬取包方法后,转入爬取实战. 爬取豆瓣电影早已是练习爬取的常用方式了,网上各种代码也已经很多了,我可能现在还在做这个都太土了,不过没事,毕竟我也才刚入门-- 这次我还是利用正则表达式 ...
Python网络爬虫requests、bs4爬取空姐图片，福利哦
Scrapy框架很好,也提供了很多扩展点,可以自己编写中间件处理Scrapy的Request和Response.但是可定制化或者可掌控性来说,还是自己写的爬虫更加强一些. 接下来,我们来看一下使用Py ...
爬虫项目：scrapy爬取昵图网全站图片
一.创建项目.spider,item以及配置setting 创建项目:scrapy startproject nitu 创建爬虫:scrapy genspider -t basic nituwang ...
python网络爬虫之使用scrapy爬取图片
在前面的章节中都介绍了scrapy如何爬取网页数据,今天介绍下如何爬取图片. 下载图片需要用到ImagesPipeline这个类,首先介绍下工作流程: 1 首先需要在一个爬虫中,获取到图片的url并存 ...
Python网络爬虫项目实战(二)数据解析
上一篇说完了如何爬取一个网页,以及爬取中可能遇到的几个问题.那么接下来我们就需要对已经爬取下来的网页进行解析,从中提取出我们想要的数据. 根据爬取下来的数据,我们需要写不同的解析方式,最常见的一般都是 ...
Python网络爬虫（6）--爬取淘宝模特图片
经过前面的一些基础学习,我们大致知道了如何爬取并解析一个网页中的信息,这里我们来做一个更有意思的事情,爬取MM图片并保存.网址为https://mm.taobao.com/json/request_t ...
python3 [爬虫入门实战]scrapy爬取盘多多五百万数据并存mongoDB
总结:虽然是第二次爬取,但是多多少少还是遇到一些坑,总的结果还是好的,scrapy比多线程多进程强多了啊,中途没有一次被中断过. 此版本是盘多多爬取数据的scrapy版本,涉及数据量较大,到现在已经是 ...
Python网络爬虫——Appuim+夜神模拟器爬取得到APP课程数据
一.背景介绍随着生产力和经济社会的发展,温饱问题基本解决,人们开始追求更高层次的精神文明,开始愿意为知识和内容付费.从2016年开始,内容付费渐渐成为时尚. 罗辑思维创始人罗振宇全力打造" ...

Python网络爬虫--项目实战--scrapy爬取人人车

一、目标

二、分析

2.1 网站分析

三、完整代码

四、遇到的坑

Python网络爬虫--项目实战--scrapy爬取人人车相关推荐

最新文章

热门文章