python爬取景点信息_Python用Pyspider爬取TripAdvisor的景点信息

先上效果图

上代码：from pyspider.libs.base_handler import *

import pymongo

class Handler(BaseHandler):

crawl_config = {

}

client = pymongo.MongoClient('localhost')

db = client['TripAdvise']

@every(minutes=24 * 60)

def on_start(self):

self.crawl('https://www.tripadvisor.cn/Attractions-g186338-Activities-c47-London_England.html',callback=self.index_page, validate_cert=False)

@config(age=10 * 24 * 60 * 60)

def index_page(self, response):

for each in response.doc('div.listing_title > a').items():

self.crawl(each.attr.href, callback=self.detail_page, validate_cert=False)

# 翻页

next = response.doc('.pagination .nav.next').attr.href

self.crawl(next, callback=self.index_page, validate_cert=False)

@config(priority=2)

def detail_page(self, response):

url = response.url

name = response.doc('.h1').text()

rating = response.doc(' div.ratingContainer > a > span').text()

garde = response.doc('div.section.rating > span').text()

address = response.doc('.contactInfo > .address').text()

phone = response.doc('div.contact > div.contactType.phone.is-hidden-mobile > div').text()

opening = response.doc('div.prw_rup.prw_common_atf_header_bl_responsive.headerBL > div > span').text()

introduction = response.doc('.centerWell > div > div > div > div > div > span').text()

return {

"url": url,

"name": name,

"rating": rating,

'garde': garde,

"address": address,

"phone": phone,

"opening": opening,

"introduction": introduction

}

def on_result(self, result):

if result:

self.save_to_mongo(result)

def save_to_mongo(self, result):

if self.db['london'].insert(result):

print('保存到MongoDB成功', result)

python爬取景点信息_Python用Pyspider爬取TripAdvisor的景点信息相关推荐

python 爬取_Python爬虫爬取马蜂窝北京景点信息
背景来北京有些日子了,但是每个周末都窝在六环外的村里躺着.想想不能再这么浪费时间了,得出去玩!但是去哪玩呢?于是乎想着,先把北京的景点以及位置都保存起来,然后在分析分析做个出行计划.从哪里获取景点信 ...
Python爬虫爬取马蜂窝北京景点信息
背景来北京有些日子了,但是每个周末都窝在六环外的村里躺着.想想不能再这么浪费时间了,得出去玩!但是去哪玩呢?于是乎想着,先把北京的景点以及位置都保存起来,然后在分析分析做个出行计划.从哪里获取景点信 ...
python爬取豆瓣电影信息_Python爬虫入门 | 爬取豆瓣电影信息
这是一个适用于小白的Python爬虫免费教学课程,只有7节,让零基础的你初步了解爬虫,跟着课程内容能自己爬取资源.看着文章,打开电脑动手实践,平均45分钟就能学完一节,如果你愿意,今天内你就可以迈入爬 ...
python爬取boss直聘招聘信息_Python爬虫实战-抓取boss直聘招聘信息
Python Python开发 Python语言 Python爬虫实战-抓取boss直聘招聘信息实战内容:爬取boss直聘的岗位信息,存储在数据库,最后通过可视化展示出来 PS注意:很多人学Pyth ...
python爬虫电影信息_Python爬虫入门 | 爬取豆瓣电影信息
这是一个适用于小白的Python爬虫免费教学课程,只有7节,让零基础的你初步了解爬虫,跟着课程内容能自己爬取资源.看着文章,打开电脑动手实践,平均45分钟就能学完一节,如果你愿意,今天内你就可以迈入爬 ...
python爬取商品信息_Python基于BeautifulSoup爬取京东商品信息
今天小编利用美丽的汤来为大家演示一下如何实现京东商品信息的精准匹配~~ HTML文件其实就是由一组尖括号构成的标签组织起来的,每一对尖括号形式一个标签,标签之间存在上下关系,形成标签树:因此可以说Be ...
python爬虫外贸客户_python实战成功爬取海外批发商价格信息并写入记事本
运行平台:windows Python版本:Python 3.7.0 用到的第三方库:requests ,Beautiful Soup,re IDE:jupyter notebook 浏览器:Chro ...
python爬虫抓取文本_Python实现可获取网易页面所有文本信息的网易网络爬虫功能示例...
本文实例讲述了Python实现可获取网易页面所有文本信息的网易网络爬虫功能.分享给大家供大家参考,具体如下: #coding=utf-8 #------------------------------ ...
python画xy轴_Python＋pyqtgraph数据可视化：自定义坐标轴信息
引言 pyqtgraph是Python平台上一种功能强大的2D/3D绘图库,相对于matplotlib库,由于内部实现方式上,使用了高速计算的numpy信号处理库以及Qt的GraphicsView框架 ...
python网络爬虫代理服务器_python爬虫如何抓取代理服务器
一年前突然有个灵感,想搞个强大的网盘搜索引擎,但由于大学本科学习软件工程偏嵌入式方向,web方面的能力有点弱,不会jsp,不懂html,好久没有玩过sql,但就是趁着年轻人的这股不妥协的劲儿,硬是把以 ...

python爬取景点信息_Python用Pyspider爬取TripAdvisor的景点信息

python爬取景点信息_Python用Pyspider爬取TripAdvisor的景点信息相关推荐

最新文章

热门文章