python外环是什么意思_爬取上海地铁站并且规划出行路线

源码源码：源码链接

需求和目标

爬取网站：上海地铁的百科词条

获取的爬虫数据结果保存样式如下

{

'莘庄站':

{'subway': ['上海地铁1号线', '上海地铁5号线'],

'neibour': ['外环路站', '春申路站']},

'莲花路站':

{'subway': ['上海地铁1号线'],

'neibour': ['外环路站', '锦江乐园站']}

}

爬取的数据保存中这种格式是为了方便后续处理

根据爬取的数据可以根据起点和终点规划出行路线

爬虫步骤：

爬取上海地铁的百科词条，获取到地铁线路的名称和链接

爬取地铁详情页，提取站点信息并保存

爬虫部分

class ShangHaiSubway:

def __init__(self):

self.headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'

}

self.url = 'https://baike.baidu.com/item/%E4%B8%8A%E6%B5%B7%E5%9C%B0%E9%93%81/1273732?fr=aladdin'

self.station_dict = defaultdict(list)

def get_xpath_obj(self, url):

'''根据url获取xpath对象'''

response = requests.get(url, headers=self.headers)

text = response.content.decode()

return etree.HTML(text)

def get_subways(self, xpath_obj):

'''解析xpath 对象，获取地铁的链接和地铁名称'''

subways = xpath_obj.xpath('//table[@log-set-param="table_view"][1]//tr/td//a')

for subway in subways:

name = subway.xpath('./text()')

href = subway.xpath('./@href')

if not all([name, href]): continue

name = name[0]

href = 'https://baike.baidu.com' + href[0]

yield name, href

def get_stations(self, subway_name, href):

'''访问地铁链接，获取到地铁线路的每一个站点，构造最终数据'''

xpath_obj = self.get_xpath_obj(href)

stations = xpath_obj.xpath('//table[@log-set-param="table_view"][1]//tr')

before = None

for station in stations:

station = station.xpath('./td')

if len(station) <= 1: continue

flag = len(station)

station = station[0].xpath('.//text()')

if not station: continue

if station[0] in ['站名', '车站', '内圈']: continue

if station[0].startswith('往') and flag == 2: continue

station = station[0]

if self.station_dict[station]:

self.station_dict[station]['subway'].append(subway_name)

if before:

self.station_dict[station]['neibour'].append(before)

self.station_dict[before]['neibour'].append(station)

before = station

else:

if before:

self.station_dict[station] = {

'subway': [subway_name], 'neibour': [before]

}

self.station_dict[before]['neibour'].append(station)

else:

self.station_dict[station] = {

'subway': [subway_name], 'neibour': []

}

before = station

def run(self):

xpath_obj = self.get_xpath_obj(self.url)

for name, href in self.get_subways(xpath_obj):

self.get_stations(name, href)

return self.station_dict

根据爬取到的数据规划路线

详细逻辑解释参考这篇博客

def search(start, destination, connection_graph, sort_rule):

'''

connection_graph: 站点关系拓扑图

'''

# 所有行程解决方案

pathes = [[start]]

visited = set()

while pathes:

# 单个行程解决方案

path = pathes.pop(0)

middle_point = path[-1]

if middle_point in visited: continue

# print(connection_graph,middle_point,'mid')

if not connection_graph[middle_point]:

print('输入地铁站有误')

break

successors = connection_graph[middle_point]['neibour']

for city in successors:

if city in path: continue

new_path = path + [city]

pathes.append(new_path)

if city == destination:

print('path:', pathes)

return new_path

visited.add(middle_point)

pathes.append(new_path)

pathes = sort_rule(pathes)

def least(pathes):

pathes.sort(key=len)

return pathes

if __name__ == '__main__':

shang_hai_subway = ShangHaiSubway()

station_dict = shang_hai_subway.run()

print(station_dict)

path = search('杨高南路站','莘庄站',station_dict,least)

for s in path:

print(s+str(station_dict[s]['subway'])+'↓')

结果如下

杨高南路站['上海地铁7号线']↓

高科西路站['上海地铁6号线', '上海地铁7号线']↓

东明路站['上海地铁6号线', '上海地铁13号线']↓

成山路站['上海地铁8号线', '上海地铁13号线']↓

长清路站['上海地铁7号线', '上海地铁13号线']↓

后滩站['上海地铁7号线']↓

龙华中路站['上海地铁7号线', '上海地铁12号线']↓

龙华站['上海地铁11号线', '上海地铁12号线']↓

龙漕路站['上海地铁3号线', '上海地铁12号线']↓

石龙路站['上海地铁3号线']↓

上海南站站['上海地铁1号线', '上海地铁3号线']↓

锦江乐园站['上海地铁1号线']↓

莲花路站['上海地铁1号线']↓

外环路站['上海地铁1号线']↓

莘庄站['上海地铁1号线', '上海地铁5号线']↓

显示还不够，没有换乘提示，继续优化：

if __name__ == '__main__':

shang_hai_subway = ShangHaiSubway()

station_dict = shang_hai_subway.run()

print(station_dict)

path = search('杨高南路站', '莘庄站', station_dict, least)

# for s in path:

# print(s+str(station_dict[s]['subway'])+'↓')

subway = None

for i in range(len(path) - 1):

station = path[i]

if len(station_dict[station]['subway']) == 1:

subway = station_dict[station]['subway'][0]

else:

new_subway = set(station_dict[path[i + 1]]['subway']) & set(station_dict[path[i]]['subway'])

new_subway = new_subway.pop()

if not subway:

subway = new_subway

else:

if new_subway != subway:

print('换乘到', new_subway)

subway = new_subway

print(station, subway)

print(path[-1],subway)

结果如下

杨高南路站上海地铁7号线

换乘到上海地铁6号线

高科西路站上海地铁6号线

换乘到上海地铁13号线

东明路站上海地铁13号线

成山路站上海地铁13号线

换乘到上海地铁7号线

长清路站上海地铁7号线

后滩站上海地铁7号线

换乘到上海地铁12号线

龙华中路站上海地铁12号线

龙华站上海地铁12号线

换乘到上海地铁3号线

龙漕路站上海地铁3号线

石龙路站上海地铁3号线

换乘到上海地铁1号线

上海南站站上海地铁1号线

锦江乐园站上海地铁1号线

莲花路站上海地铁1号线

外环路站上海地铁1号线

莘庄站上海地铁1号线

可以看到刚才我们7号线应该是可以不换乘的但是我们的排序策略是经历的总站数最少而不是换乘最少。其实高德地图之前是两种策略都会提供。现在我已不在魔都了。我们现在来定义换乘最少的排序规则

def least_change(pathes):

def get_change_times(path):

subway = None

times = 0

for i in range(len(path) - 1):

station = path[i]

if len(station_dict[station]['subway']) == 1:

subway = station_dict[station]['subway'][0]

else:

new_subway = set(station_dict[path[i + 1]]['subway']) & set(station_dict[path[i]]['subway'])

new_subway = new_subway.pop()

if not subway:

subway = new_subway

else:

if new_subway != subway:

times += 1

subway = new_subway

return times

pathes.sort(key=get_change_times)

return pathes

新的排序规则结果如下

杨高南路站上海地铁7号线

高科西路站上海地铁7号线

云台路站上海地铁7号线

耀华路站上海地铁7号线

长清路站上海地铁7号线

后滩站上海地铁7号线

换乘到上海地铁12号线

龙华中路站上海地铁12号线

龙华站上海地铁12号线

龙漕路站上海地铁12号线

换乘到上海地铁1号线

漕宝路站上海地铁1号线

上海南站站上海地铁1号线

锦江乐园站上海地铁1号线

莲花路站上海地铁1号线

外环路站上海地铁1号线

莘庄站上海地铁1号线

原文链接:https://blog.csdn.net/weixin_44673043/article/details/107496631

python外环是什么意思_爬取上海地铁站并且规划出行路线相关推荐

python玩王者荣耀皮肤_爬取王者荣耀皮肤图片
[python]代码库import urllib.request import re # 获取主页源码 url = 'https://pvp.qq.com/web201605/herolist.sht ...
python保存数据为图片_爬取的数据怎么保存为图片？
我这儿有个相似的例程,你可以参考下: 效果图,在图片上添加文字: python3代码: import os import random import time from PIL import Imag ...
用python爬取东方财富网网页信息_爬取东方财富网数据的网页分析
自学Python已有3个月之多,浏览无数大神的佳作,收获颇丰.当初自学python就是为了学习爬虫,爬取网站上好看妹子的图片--[流口水][流口水] 言归正传,近期学习量化交易知识,发现东方财富网(e ...
python爬去新浪微博_!如何通过python调用新浪微博的API来爬取数据
python抓取新浪微博,求教爬手机端可以参考的代码, #-*-coding:utf8-*- import smtplib from email.mime.text import MIMEText ...
用python爬虫爬取东方财富网信息网页信息_爬取东方财富网数据的网页分析
自学Python已有3个月之多,浏览无数大神的佳作,收获颇丰.当初自学python就是为了学习爬虫,爬取网站上好看妹子的图片--[流口水][流口水] 言归正传,近期学习量化交易知识,发现东方财富网(e ...
python网络爬虫_Python网络爬虫——爬取视频网站源视频！
原标题:Python网络爬虫--爬取视频网站源视频! 学习前提 1.了解python基础语法 2.了解re.selenium.BeautifulSoup.os.requests等python第三方库 ...
python项目开发案例集锦豆瓣-Python第三个项目：爬取豆瓣《哪吒之魔童降世》短评...
前面爬完网站信息图片之后,今天的又有了个小目标,最近的电影哪吒很火,去豆瓣上看了一下影评,决定了今天主要是实现Python第三个项目:爬取豆瓣<哪吒之魔童降世> 短评,然后下载在exce ...
python 爬取上海体育彩票文章标题、时间、内容
python期末大作业爬取上海体育彩票文章标题.时间.内容并计算词频.生成特殊形状的词云图利用selenium爬取内容代码: # https://www.shsportslottery.com/ ...
从入门到入土：Python爬虫学习|实例练手|爬取猫眼榜单|Xpath定位标签爬取|代码
此博客仅用于记录个人学习进度,学识浅薄,若有错误观点欢迎评论区指出.欢迎各位前来交流.(部分材料来源网络,若有侵权,立即删除) 本人博客所有文章纯属学习之用,不涉及商业利益.不合适引用,自当删除! 若 ...

python外环是什么意思_爬取上海地铁站并且规划出行路线

python外环是什么意思_爬取上海地铁站并且规划出行路线相关推荐

最新文章

热门文章