python爬取酒店信息_Python 爬虫练手项目

from bs4 import BeautifulSoup

import requests

import time

import re

url = 'http://search.qyer.com/hotel/89580_4.html'

urls = ['http://search.qyer.com/hotel/89580_{}.html'.format(str(i)) for i in range(1,10)] # 最多157页

infos = []

# print(urls)

# 批量爬取数据

def getAUrl(urls):

data_number = 0

for url in urls:

getAttractions(url)

print('--------------{}-----------------'.format(len(infos)),sep='\n')

# 爬取当页面数据

def getAttractions(url,data = None):

web_data = requests.get(url)

time.sleep(2)

soup = BeautifulSoup(web_data.text,'lxml')

# print(soup)

hotel_names = soup.select('ul.shHotelList.clearfix > li > h2 > a')

hotel_images = soup.select('span[class="pic"] > a > img')

hotel_points = soup.select('span[class="points"]')

hotel_introduces = soup.select('p[class="comment"]')

hotel_prices = soup.select('p[class="seemore"] > span > em')

if data == None:

for name,image,point,introduce,price in \

zip(hotel_names,hotel_images,hotel_points,hotel_introduces,hotel_prices):

data = {

'name':name.get_text().replace('\r\n','').strip(),

'image':image.get('src'),

'point':re.findall(r'-?\d+\.?\d*e?-?\d*?', point.get_text())[0],

'introduce':introduce.get_text().replace('\r\n','').strip(),

'price':int(price.get_text())

}

# print(data)

infos.append(data)

# 根据价格从高到低进行排序

def getInfosByPrice(infos = infos):

infos = sorted(infos, key=lambda info: info['price'], reverse=True)

for info in infos:

print(info['price'], info['name'])

# getAttractions(url)

爬取的网站链接

遇到的问题及解决办法

①Strip()方法用于删除开始或结尾的字符。lstrip()|rstirp()分别从左右执行删除操作。默认情况下会删除空白或者换行符，也可以指定其他字符。

②如果想处理中间的空格，需要求助其他技术，比如replace(),或者正则表达式

③strip()和其他迭代结合，从文件中读取多行数据，使用生成器表达式

④更高阶的strip

可能需要使用translate()方法

python爬取酒店信息_Python 爬虫练手项目—酒店信息爬取相关推荐

python画画需要什么模块_python实战练手项目---使用turtle模块画奥运五环
python实战练手项目---使用turtle模块画奥运五环 2020年将举办东京奥运会,本篇实践文章将带你使用turtle模块画一个五环图,先来看效果图 1. 定义一个类继承Turtle class ...
Python 爬虫练手项目—酒店信息爬取
from bs4 import BeautifulSoup import requests import time import reurl = 'http://search.qyer.com/hot ...
python做什么生意好找_Python 的练手项目有哪些值得推荐？
注:下面分享的练习项目面向刚入门的Python学习者,项目的代码地址见结尾. Python 是一门虽然简单却很强大的编程语言.可能有些刚入门 Python 的朋友,虽然已经掌握了 Python 的基础 ...
python做些什么项目_Python 的练手项目有哪些值得推荐
1 Web方向的练手项目这个其实是肯定不用多少的了.Python的练手项目就是可以做一个网站了.我们可以做一个属于自己的博客.在做博客的时候,我们可以巩固的知识点是 Html+CSS+JS的基础知识 ...
python上网行为分析_python实战练手项目---获取谷歌浏览器的历史记录，分析一个人的上网行为...
python实战练手项目---获取谷歌浏览器的历史记录,分析一个人的上网行为谷歌浏览器的历史浏览记录存储在名为History sqlite文件中,在mac环境下,该文件的地址是 /Users/zha ...
爬虫练手项目：获取豆瓣评分最高的电影并下载
前期回顾上篇博文我们学习了Python爬虫的四大库urllib ,requests ,BeautifulSoup以及selenium 爬虫常用库介绍学习了urllib与request的常见用法学 ...
python爬取58同城租房信息_python爬虫：找房助手V1.0-爬取58同城租房信息(示例代码)...
#!/usr/bin/python # -*- encoding:utf-8 -*-importrequests frombs4 importBeautifulSoup frommultiproces ...
python爬取58同城所有租房信息_python爬虫：找房助手V1.0-爬取58同城租房信息
#!/usr/bin/python # -*- encoding:utf-8 -*-importrequests frombs4 importBeautifulSoup frommultiproces ...
2021-07-31 Python爬虫练手项目--爬取上千张明星美图
爬虫每日练手--上千张美女明星优质图 1.确定目标 2.提取链接爬取封面链接爬取子页面 3.代码及结果完整代码输出结果代码细节解释设置简单反爬打印输出结果 4.成品欣赏 1.确定目标目 ...

python爬取酒店信息_Python 爬虫练手项目—酒店信息爬取

python爬取酒店信息_Python 爬虫练手项目—酒店信息爬取相关推荐

最新文章

热门文章