目的:爬取携程网址

火车 中的单程与中转

单程
url=“https://trains.ctrip.com/trainbooking/search?tocn=%25e5%258d%2583%25e5%25b2%259b%25e6%25b9%2596&fromcn=%25e6%259d%25ad%25e5%25b7%259e&day=2020-12-31”
中转
url=“https://trains.ctrip.com/pages/booking/hubSingleTrip?ticketType=2&fromCn=%25E6%259D%25AD%25E5%25B7%259E&toCn=%25E5%258D%2583%25E5%25B2%259B%25E6%25B9%2596&departDate=2020-12-31”
采用parse.quote()进行url转码
采用csv进行数据保存
random.choice进行选择一个User Agent 自认为这是个不错的习惯
携程单程信息在原网页源代码中
携程中转网址火车中中转信息保存在json文件中(js_url)

LET’S GO

url="https://trains.ctrip.com/pages/booking/hubSingleTrip?ticketType=5&fromCn=%25E6%259D%25AD%25E5%25B7%259E&toCn=%25E6%2596%25B0%25E4%25B9%25A1&departDate=2020-12-30" #  携程单程火车原网址 查询参数 fromcn 出发站 tocn 目的站 departDate 日期
#原网页查询参数需要进行两次url编码(注意点1)
#携程单程信息在原网页源代码中
'''
url="https://trains.ctrip.com/pages/booking/hubSingleTrip?ticketType=2&fromCn=%25E6%259D%25AD%25E5%25B7%259E&toCn=%25E5%258D%2583%25E5%25B2%259B%25E6%25B9%2596&departDate=2020-12-31"
js_url="https://trains.ctrip.com/pages/booking/getTransferList?departureStation=%2525E6%25259D%2525AD%2525E5%2525B7%25259E&arrivalStation=%2525E6%252596%2525B0%2525E4%2525B9%2525A1&departDateStr=2020-12-30"
携程中转网址火车中中转信息保存在json文件中(js_url) 查询参数departureStation arrivalStation departDateStr
类似稍加自己比较即可发现
js_url查询参数需要进行三次url编码(注意点2)
'''from urllib import parse
import random
from bs4 import BeautifulSoup
import  csv
import os
import requests
# print(parse.unquote((parse.unquote("%25E6%259D%25AD%25E5%25B7%259E"))))
fromArea = input("出发站")
toArea = input("目的站")
date=input("年-月-日 :")
if not os.path.exists("D:/携程查找练习"):#创建后续保存文件os.mkdir("D:/携程查找练习")
class NewsByTransfer():#该类用于爬取中转的信息def __init__(self):#初始化self.fromArea=fromAreaself.toArea=toAreaself.date=datedef getOneJsUrl(self,fromArea,toArea,date):#进行js_url拼接fromArea=parse.quote(parse.quote(fromArea))departureStation=parse.quote(fromArea)toArea=parse.quote(parse.quote(toArea))arrivalStation=parse.quote(toArea)url="https://trains.ctrip.com/pages/booking/hubSingleTrip?ticketType=5&fromCn="+fromArea+"&toCn="+toArea #原网页js_url="https://trains.ctrip.com/pages/booking/getTransferList?departureStation="+departureStation+"&arrivalStation="+arrivalStationjs_url=js_url+"&departDateStr="+date# print(url)print(js_url)return js_urldef getOneNews(self,js_url):#爬取js_url信息UA = ["Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/537.75.14","Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)",'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11','Opera/9.25 (Windows NT 5.1; U; en)','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)','Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)','Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12','Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9',"Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 Chrome/16.0.912.77 Safari/535.7","Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0 "]user_agent = random.choice(UA)text=requests.get(js_url,headers={"User-Agent":user_agent}).json() #获取json字符串用于python字典处理transferList=text["data"]["transferList"]#第一次定位主要信息列表csvList=[]#创建csv后续dictwriter写入 保存列表for oneTransfer in  transferList:# print(oneTransfer)tranDict={}tranDict["总出发站"] = oneTransfer["departStation"]tranDict["总目的站"] = oneTransfer["arriveStation"]tranDict["总信息"] = oneTransfer["transferStation"] + "换乘 停留" + oneTransfer["transferTakeTime"] + " 全程" + \oneTransfer["totalRuntime"] + " 价格" + oneTransfer['showPriceText']trainTransferInfosList=oneTransfer["trainTransferInfos"]for trainTransferInfos in trainTransferInfosList:tranDict[f"班次列车号{trainTransferInfos['sequence']}"]=trainTransferInfos['trainNo']tranDict[f"发车时间-到站时间{trainTransferInfos['sequence']}"]=trainTransferInfos['departDate']+" "+  \trainTransferInfos['departTime']+"---"+trainTransferInfos['arriveDate']+" "+trainTransferInfos['arriveTime']tranDict[f"发车站-目的站{trainTransferInfos['sequence']}"]=trainTransferInfos[ 'departStation']+"---"  +\trainTransferInfos["arriveStation"]csvList.append(tranDict)print(csvList)return csvListdef mkcsv(self,csvlist):#创建csv文件with open(f"D:/携程查找练习/{csvlist[0]['总出发站']}到{csvlist[0]['总目的站']}转站查找.csv","w+",newline="",encoding="utf-8") as f:writer = csv.DictWriter(f, list(csvlist[0].keys()))writer.writeheader()writer.writerows(csvlist)def main(self):js_url = self.getOneJsUrl(self.fromArea, self.toArea, self.date)csvList = self.getOneNews(js_url)self.mkcsv(csvList)print(csvList)class NewsBySingle():#爬取单程信息def __init__(self):self.fromArea=fromAreaself.toArea=toAreaself.date=datedef getOneUrl(self,fromArea,toArea,date):fromArea=parse.quote(parse.quote(fromArea))toArea=parse.quote(parse.quote(toArea))url="https://trains.ctrip.com/trainBooking/search?ticketType=0&fromCn="+fromArea+"&toCn="+toArea+"&day="+self.date+"&mkt_header=&orderSource="# print(url)print(url)return urldef getOneNews(self,url):UA = ["Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/537.75.14","Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)",'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11','Opera/9.25 (Windows NT 5.1; U; en)','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)','Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)','Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12','Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9',"Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 Chrome/16.0.912.77 Safari/535.7","Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0 "]user_agent = random.choice(UA)text=requests.get(url,headers={"User-Agent":user_agent}).content.decode("utf-8")print(text)#获取源代码为后续bs4解析soup=BeautifulSoup(text,"lxml")oneTripList=soup.select("div.railway_list")print(len(oneTripList))oneTripNewList=[]for oneTrip in oneTripList:oneTripDict={}print(oneTrip)oneTripDict["班次列车号"]=oneTrip.select("strong")[0].stringoneTripDict["出发站名称"]=oneTrip.select("span")[0].stringoneTripDict["出发站时间"]=oneTrip.select("strong")[1].stringoneTripDict["中途时间"]=list(oneTrip.select("div.haoshi")[0].stripped_strings)[0]oneTripDict["目的站名称"]=oneTrip.select("span")[1].stringoneTripDict["到站时间"]=oneTrip.select("strong")[2].stringprint(oneTripDict)oneTripNewList.append(oneTripDict)print("---"*60)print(oneTripNewList)return oneTripNewListdef mkcsv(self,oneTripNewList):with open(f"D:/携程查找练习/{oneTripNewList[0]['出发站名称']}到{oneTripNewList[0]['目的站名称']}单程查找.csv","w+",newline="",encoding="utf-8") as f:writer = csv.DictWriter(f, list(oneTripNewList[0].keys()))writer.writeheader()writer.writerows(oneTripNewList)def main(self):url=self.getOneUrl(self.fromArea, self.toArea, self.date)oneTripNewList=self.getOneNews(url)self.mkcsv(oneTripNewList)
NewsByTransfer().main()
NewsBySingle().main()

总结一下还是不难的,基础打牢,python类函数运用 bs4解析(感觉比xpath直观)然后浏览器network中xhr就简单看看写一下就行,给个点赞好不,创造不易,无需关注,给赞就行

python bs4 csv requests 爬虫 爬取携程火车票网址信息并保存相关推荐

  1. 爬虫 — 爬取携程的航班信息

    功能介绍:输入起点.终点.时间就能得到携程上的航班信息 代码: from prettytable import PrettyTable import requests import jsondef x ...

  2. python爬携程景区评论_python爬取携程景点评论信息

    python爬取携程景点评论信息 今天要分析的网站是携程网,获取景点的用户评论,评论信息通过json返回API,页面是这个样子的 按下F12之后,F5刷新一下 具体需要URL Request的方式为P ...

  3. python爬取携程旅游评价信息词云图分析

    python爬取携程旅游评价信息词云图分析 前言 前面咱们已经分析过如何爬取携程旅游的相关信息,如果没有看过的,可以先跳转看一下前面的那篇博客:python 爬虫 一键爬取携程旅游团数据 这一篇呢,咱 ...

  4. python爬携程上出境游数据_python爬取携程旅游评价信息词云图分析

    python爬取携程旅游评价信息词云图分析 前言 前面咱们已经分析过如何爬取携程旅游的相关信息,如果没有看过的,可以先跳转看一下前面的那篇博客:python 爬虫 一键爬取携程旅游团数据 这一篇呢,咱 ...

  5. JAVA爬虫爬取携程酒店数据selenium实现

    在爬取携程的时候碰到很多的壁垒,接下来分析所有过程 1.根据以往经验最初想到用jsoup去解析每个HTML元素,然后拿到酒店数据,然后发现解析HTML根本拿不到id为hotel_list的div,所以 ...

  6. python携程酒店评论_Python基于selenium爬取携程酒店评论信息

    爬取站点 任意一个携程酒店的详细链接,这里给出了四个,准备开四个线程爬取: https://hotels.ctrip.com/hotel/6278770.html#ctm_ref=hod_hp_hot ...

  7. Python BeautifulSoup和Requests爬虫爬取中关村手机资料

    ** 一.事前准备 ** 目标:爬取www.zol.com.cn的收集资料,各种参数 库:BeautifulSoup和requests.对BeautifulSoup库的find和find_all有基本 ...

  8. python requests爬虫——爬取李开复博客信息(包括javascript部分信息)

    今天是国庆假期第二天,已经玩了一天了,今天整理一下前两天写的数据分析作业思路,给实验报告打一下底稿.供对爬虫有兴趣的小伙伴们参考,也希望给实验没完成的同学提供一点小小的帮助. 任务要求. 1)分析页面 ...

  9. python爬虫爬取携程国际机票航班信息,返回json串

    # -*- coding: utf-8 -*- import requests, json import hashlib import re# 此处的参数是json 出发三字码,达到三字码,出发时间, ...

最新文章

  1. 赠书福利 | Tidio AI 趋势报告:约42%受访者能够接受机器人伴侣
  2. PyODPS学习:使用DataFrame实现SQL的IF判断
  3. 职场社交是一个真需求吗?
  4. Windows 7 :微软目前最好的操作系统
  5. 2020年中国在线少儿英语培训市场研究报告
  6. 树莓派智能小车python_人工智能-树莓派小车(1)——DuerOS语音唤醒
  7. Android开发人员必备的10 个开发工具
  8. webpack4.0各个击破(3)—— Assets篇
  9. Java 8之新特性详解
  10. java webtable_java winform开发:JTable详解
  11. hdfs文件系统无法在网页上显示
  12. 【语音分析】基于matlab GUI语音信号分析【含Matlab源码 1718期】
  13. imagesc和pcolor区别
  14. ceph的数据存储之路(7) -----PG 的状态机和peering过程
  15. iOS 开发 Provisioning Profile(描述文件) 详解
  16. 纳德拉时代下的微软开源之路
  17. 流媒体服务器使用手册
  18. 超微服务器做系统,超微服务器做系统
  19. Android Q之气泡弹窗
  20. Java银行卡校验API

热门文章

  1. 嵌入式c程序易错点(转自http://blog.csdn.net/zhzht19861011/article/details/45508029)
  2. 接口定义语言IDL,COM
  3. 如何做推广引流?教你利用音频长期被动引流
  4. 抖音视频解析PHP源码抖音解析php版api
  5. 电子书籍下载大汇总,超级经典,超级汇总!
  6. Twitter技术主管回怼马斯克:不懂技术乱评价!马斯克:He’s fired
  7. TCP/IP卷二 mbuf
  8. 关于数据埋点的基础认识
  9. npm安装的yarn遇到【yarn : 无法将“yarn”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请确保路径正确,然后再试一次】
  10. PyGame “超级玛丽”游戏专题开篇——“变态超级玛丽”的诞生