最近因为项目需要,就去了解学习了Python爬虫的一些知识,并在此分享出学习过程中的难题和经验。
  先看最终程序输出

  {"website": "<a href="http://www.somboonseafood.com/" target="_blank" rel="nofollow">http://www.somboonseafood.com/</a>", "comment": ["进去里面已经人满为患,服务生来往都是急匆匆的。我们前面还有一桌外国人在等位子。好在等待的时间不长,很快我们被带到了二楼。菜单上有中英文的翻译。我们除了必点的咖喱蟹,还点了腰果鸡肉,酸辣鱿鱼,芒果糯米饭和冬阴功汤。建兴比较好的是菜品都有小份的,适合2人吃的。这顿饭具体花了多少泰铢不记得了,反正折合人民币二百多吧。他家不能拉卡,只能付现金哦~", "http://b3-q.mafengwo.net/s8/M00/4B/D5/wKgBpVXxM4aAdrXbACreEebl8Ug36.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://a1-q.mafengwo.net/s8/M00/4B/E8/wKgBpVXxM5GAD-uAAAuCjK25BIo42.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://n3-q.mafengwo.net/s8/M00/4B/EC/wKgBpVXxM5KAMPIxAAz_DjXUweA78.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "我们四个人点了红油咖喱蟹,粉丝闷虾,炒含羞草,还有芒果汁,柠檬汁。咖喱蟹很好吃,炒的很香很入味,如果将那红油用来拌饭,味道肯定很赞;粉丝闷虾也不错,四个人吃刚刚好;含羞草就有点老了,除此之外还有个酱油蒸石斑鱼,按斤卖的,一条快一千多了,不过肉质很劲道,吃多来还能塞牙缝呢,真的很新鲜", "http://a1-q.mafengwo.net/s8/M00/FD/32/wKgBpVXsL3eAb2oLAAs12tssU2Y97.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://c3-q.mafengwo.net/s8/M00/FD/3C/wKgBpVXsL4OAOf0zAAjok0qVt-406.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://c1-q.mafengwo.net/s8/M00/FD/46/wKgBpVXsL5CAT3kGAAn4-5VHSAg78.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "咖喱螃蟹不错,就是螃蟹少了鸡蛋多了哈哈哈,感觉最好吃的是我们随便点的虾子,炸得超级脆然后上面裹的粉好好吃。三个菜加一瓶矿泉水1000多株,感觉有点小贵,因为感觉没有传说中的那么那么好吃哈哈哈", "http://a2-q.mafengwo.net/s8/M00/78/4D/wKgBpVXYk4yAV9M9ABim-ixW7lg98.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://a2-q.mafengwo.net/s8/M00/78/52/wKgBpVXYk5CAbjcKABwM1hcyCZU62.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://b2-q.mafengwo.net/s8/M00/78/57/wKgBpVXYk5SAcFR0ABsEFh2YADQ90.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "他们这边的咖喱跟我们平时吃的不一样,偏甜一点!", "这一顿才化了1000B多点,这里是不能刷卡的,所以记得带好现金再去!", "http://b1-q.mafengwo.net/s8/M00/FE/BC/wKgBpVXdJmiAE16jAAW6XZkem8k36.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://n3-q.mafengwo.net/s8/M00/FE/97/wKgBpVXdJlWAVXipAAGLzCC7YP400.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://n1-q.mafengwo.net/s8/M00/FE/DD/wKgBpVXdJoCAaODiAAbf-o77ojA68.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "这顿饭是在曼谷吃的最贵的一餐,总共705铢。这家餐馆的味道也没有想象中多惊艳啦,发现其实泰国随便一家路边的拍档做的泰国菜味道都可以的。", "http://n2-q.mafengwo.net/s8/M00/14/52/wKgBpVXVzzKAS4rrAA2AHp_Mk3w39.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://a2-q.mafengwo.net/s8/M00/14/55/wKgBpVXVzzaAUAa8AAr8LPWGCSA46.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://c3-q.mafengwo.net/s8/M00/14/5A/wKgBpVXVzzmAd_D9AAuDW5yUmOI43.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "在各大攻略了声名显赫的他果然火爆,下午3点了还是排长龙!建议大家一定事先打电话预约哦!招牌菜咖喱蟹还行吧,总体上比其他泰餐还是强,但价格也确实不便宜。", "不过个人觉得是又贵又没有特色,连姐妹说的好吃到炸的咖哩蟹我个人觉得也没有米特拉的好吃,还不如一株粥(接下来会提到)。反正不建议去,当然也可以尝试一下被宰的感觉。", "http://n3-q.mafengwo.net/s8/M00/D0/5D/wKgBpVXKGSmAFS9sAAPmXxP4odQ66.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "建兴酒家的菜也还行,在泰国物价里感觉应该不算便宜,尤其都是海鲜对比在普吉岛吃过的东西,一个天上一个地下。泰国特色咖喱蟹,口味跟日本咖喱不同,椰浆味道比较重,两人吃少要点就行,配泰国香米。", "http://a2-q.mafengwo.net/s8/M00/7E/32/wKgBpVXJz5qAZmKMAAJIskX2Kdw52.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "出发之前就知道建兴酒家很出名,可是一直以为离我们很远,不方便去吃。偶然发现原来SIAM站也有,但是位置真的很不好找,谷人希都找不到,已经被Siam Paragon,SiamCenter,Siam Discovery搞混乱了,当时已经饿的不行了,皇天不负有心人,终于还是找到它了,晚上六点多一点,门外已经两排凳子,排排坐了。记住是SIAM SQUARE ONE,大家去之前请做好功课,在SIAM CENTER的对面。而且最后就提前预约一下,我们等了快一个小时就才有位置,也许是享受美食,很久才会走一台。在等待的时候就已经把餐盘翻穿了,一坐下,不用等待,立刻点餐。完全忘记我们只有两个人在作战!除了海鲜拼盘不好吃,其他都一级棒!海鲜拼盘的那个蘸料太奇怪了,又酸又辣还是绿色的。", "http://n3-q.mafengwo.net/s8/M00/A8/40/wKgBpVXJ_ZmAeOxBAAjwGDzYEXg22.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://b3-q.mafengwo.net/s8/M00/A8/5A/wKgBpVXJ_a-ACPTQAAu-qwezLm881.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://b1-q.mafengwo.net/s8/M00/A8/6A/wKgBpVXJ_byAUD0UAArQl-EOgLw63.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "主打菜是咖喱蟹,确实很不错。味道偏甜,多吃会腻。", "建兴酒家泰国菜比较正宗,海鲜很新鲜。点着那泰式咖喱蟹,大虾冬阴功,不知名的某鱼还有什么泰式的蔬菜等,一边吃的欢,一边感慨:跟着攻略走,果然美味不会错!等到结账买单时,服务员上来账单一看,5800多铢,傻眼了!", "建兴酒家(CENTRAL AMBASSY店)咖喱蟹真是太棒了,咖喱蟹加了蛋黄,非常好吃。", "http://n3-q.mafengwo.net/s8/M00/ED/38/wKgBpVXFfluASZmjAAIP8zhLkns30.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://n3-q.mafengwo.net/s8/M00/ED/6B/wKgBpVXFfoaARyp6AAG0hzU8HJY32.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://n2-q.mafengwo.net/s8/M00/E6/A1/wKgBpVXB97WASTNwAAsINW94q5M85.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "建兴酒家的咖喱蟹确实好吃,也不贵。", "传说中的咖喱蟹出名的酒家。游记提到说不要轻易打的告诉司机去建兴酒家,因为可能会带你去山寨店,然后狠狠的砍。所以,请提前上建兴酒家的官网查询好具体地址,然后查找好附近的BTS,自行解决吧!价格略贵,不过味道很好!咖喱蟹诚心推荐。"], "opentime": "openTime", "description": "由华人创立的建兴酒家是曼谷一家老字号的海鲜餐厅,烹饪融合粤菜和泰国菜技法,国人比较容易接受。咖喱蟹是这里的招牌菜,炒含羞草、粉丝虾煲、蒜蓉虾也是这里的推荐菜。建兴酒家在曼谷有七家店,其中Samyan、Central Embassy、Siam Square One店是中午营业的,其他店的营业时间都是16:00-23:30。", "travel": ["/i/1008978.html", "/i/832751.html", "/i/1096198.html", "/i/811283.html", "/i/881575.html", "/i/885891.html", "/i/850595.html", "/i/962279.html", "/i/1058436.html", "/i/1250020.html", "/i/1290693.html", "/i/1285795.html", "/i/1161749.html", "/i/1078420.html", "/i/1136733.html", "/i/1008978.html", "/i/832751.html", "/i/1096198.html"], "telephone": "(66-02)2333104", "rate": "4.1", "location": "169, 169/7-12 Surawong Rd., Suriyawong, Bangrak, Bangkok 10500", "ticket": "ticket", "enname": "Somboon Seafood", "name": "建兴酒家(Surawong店) "
}

PythonIDE选择及安装

在PythonIDE选择方面,我选择的是Pycharm,很方便快捷,下载地址:
[http://www.jetbrains.com/pycharm/download/](http://www.jetbrains.com/pycharm/download/)
PyCharm 的激活方式:1,推荐购买正版。2,可以选择试用,免费试用30天。3,网上找激活码:
(下面的激活码来自互联网,仅供学习交流之用)user name: EMBRACEkey:14203-120420100000107Iq75C621P7X1SFnpJDivKnX6zcwYOYaGK3euO3ehd1MiTT"2!Jny8bff9VcTSJk7sRDLqKRVz1XGKbMqw3G

正则表达式  

在学习爬虫之前还要有正则表达式的基础,这里贴出正则表达式的基本符号含义,
  
  用的比较多的就是\d(数字)、\w(单词)、\W(非单词)、.、*、?、+


需要的库如下

  • re
  • urllib2
  • BeautifulSoup
  • json

    urllib2库用来抓取页面的html代码,在此之上可用re进行正则匹配,或BeautifulSoup进行匹配,最后匹配数据保存为json格式。


爬虫代码分析

我们首先需要爬取得页面为

我们可以看到url为www.mafengwo.cn/group/s.php?q=曼谷&p=1&t=cate&kt=1。主要参数有q ,p ,t,其中q为城市名,p为页码,t为分类,cate为美食,kt为不影响参数。

需要获取该页面,detail为域名以后的参数,这个函数可以用于获得域名主页下的网页

      #获取下级页面def getDetailPage(detailURL):try:url = "http://www.mafengwo.cn"+detailURL"request = urllib2.Request(url)response = urllib2.urlopen(request)#利用urllib2的Request方法返回一个request对象并用urlopen打开page = response.read()#用read()方法读取页面内容,Input: print page Output: 页面htmlpageCode = re.sub(r'<br[ ]?/?>', '\n', page)#去掉html里的回车空行return pageCodeexcept urllib2.URLError, e:if hasattr(e, "reason"):print e.reasonreturn None

获得每家美食店铺的链接,首先进行元素检查查看链接位于的位置

    #获得美食单页店铺链接def getFoodHref(self,pageid):url = "/group/s.php?q="+self.city+"&p=" +str(pageid)+ "&t=cate&kt=1"page = getDetailPage(url)#调用getDetailPage获得页面soup = BeautifulSoup(page,'html.parser')#用BeautifulSoup进行页面解析FoodHref = []FoodLists =  soup.find(name="div",attrs={'data-category':'poi'}).ulFoodHrefList = FoodLists.find_all("h3")#找出<div class="_j_search_section" data-category="poi">标签下所有的<h3>标签的内容,结果为店铺列表的htmlfor FoodHrefs in FoodHrefList:FoodWebsite = FoodHrefs.a['href']#对列表循环找出a标签href属性的值,即为店铺的urlFoodHrefShort = str(FoodWebsite).replace('http://www.mafengwo.cn','')#去掉url前的域名,以便等会调用getDetaiL函数,传入它获得店铺页面FoodHref.append(FoodHrefShort)return FoodHref

接下来再次调用getDetailPage(),传入FoodHref,即可可以获得店铺的页面,通过BeautifulSoup进行信息获取了。但我在抓取的时候遇到一个问题。

  这是一个信息齐全的店铺,但有的店铺没有网址,没有交通信息该怎么办。比如这个

经过元素检查发现标签也是一样的,无法通过标签特有的属性或者class的值进行定向抓取。用<div class="bd">的子节点兄弟节点查也不行。后来想出一个方法。

先写一个匹配函数hasAttr,list参数为一个中文的完整信息名列表,在getShopInfo方法里通过循环列表内容与抓取的<div class="bd">标签内容匹配,如果返回True则表示存在该信息项,否则继续匹配下一项。比如上面的图,先匹配简介,匹配失败,继续匹配英文名字,也失败,知道匹配到地址,成功,保存地址下一个标签的内容。直到获得所有信息。

        #判断是否存在信息列表def hasAttr(self,list):soup = BeautifulSoup(page, 'html.parser')col = soup.find("div", class_="col-main").find("div", class_="bd")str_col = str(col)if list in str_col:return Trueelse:return False#抓取店铺信息def getShopInfo(self,page):shopInfoList = ['brief','localName','location', 'telephone', 'website', 'ticket', 'openTime','shopName','shopScore']infoItem = ['简介', '英文名称', '地址', '电话', '网址', '门票', '开放时间','名字','星评']soup = BeautifulSoup(page, 'html.parser')shopName = soup.find("div", class_="wrapper").h1.stringshopScore = soup.find("div", class_="col-main").span.em.stringfor i in range(0,6):#信息项循环查找if self.hasAttr(page, infoItem[i]):pattern_shopinfo = re.compile('<div class="col-main.*?<div class="bd">.*?'+ infoItem[i] +'</h3>.*?>(.*?)</p>', re.S)shopInfos = re.findall(pattern_shopinfo, page)#存在该项则用正则取出其标签内容for shopInfo in shopInfos:shopInfoList[i] = shopInfoelse:#继续查找下一项continueshopInfoList[7] = shopNameshopInfoList[8] = shopScorereturn shopInfoList

最后将数据加入字典,如果一键对多值,比如dict = {a:[]},调用set default(键名,[]).append(列表值)
dict.setdefault('comment',[]).appnd(comment)

然后json.dumps(dict,indent=1).decode("unicode_escape")。indent参数是为了以json树形式表现出数据,如果内容中有中文要用decode("unicode_escape"),否则结果为”\u”的unicode编码



贴出完整代码,可通过修改最后实例MFW()内参数来改变城市名,通过修改函数saveFood()或者saveIntertainment()来分别获取该城市的美食与娱乐信息。

    #coding:utf-8import reimport urllib2from bs4 import  BeautifulSoupimport jsonimport sysreload(sys)sys.setdefaultencoding('utf-8')class MFW:def __init__(self,city):self.siteURL = 'http://www.mafengwo.cn'self.city = cityself.cityDict = {'曼谷': '11045_518', '清迈': '15284_179', '普吉岛': '11047_858', '苏梅': '14210_686', '芭堤雅': '11046_940'}self.id = self.cityDict[self.city]self.user_agent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"self.headers = { 'User-Agent' :self.user_agent}#获得美食单页店铺链接def getFoodHref(self,pageid):url = "/group/s.php?q="+self.city+"&p=" +str(pageid)+ "&t=cate&kt=1"page = self.getDetailPage(url)soup = BeautifulSoup(page,'html.parser')FoodHref = []FoodLists =  soup.find(name="div",attrs={'data-category':'poi'}).ulFoodHrefList = FoodLists.find_all("h3")for FoodHrefs in FoodHrefList:FoodWebsite = FoodHrefs.a['href']FoodHrefShort = str(FoodWebsite).replace('http://www.mafengwo.cn','')FoodHref.append(FoodHrefShort)return FoodHref#获得旅店链接def getHotelHref(self, pageid):url = "/group/s.php?q=" + self.city + "&p=" + str(pageid) + "&t=hotel&kt=1"page = self.getDetailPage(url)soup = BeautifulSoup(page, 'html.parser')hotelHref = []hotelHrefLists = soup.find_all("div",class_="hot-about clearfix _j_hotel")for hotelHrefList in hotelHrefLists:hotelWebsite= hotelHrefList.a['href']hotelHrefShort = str(hotelWebsite).replace('http://www.mafengwo.cn', '')hotelHref.append(hotelHrefShort)return hotelHref#获得页面HTMLdef getPage(self):try:url = self.siteURL+"/baike/"+str(self.id)+".html"request = urllib2.Request(url, headers=self.headers)response = urllib2.urlopen(request)page = response.read()pageCode = re.sub(r'<br[ ]?/?>', '\n', page)return pageCodeexcept urllib2.URLError, e:if hasattr(e, "reason"):print e.reasonreturn None#获得下级WEB页面HTMLdef getDetailPage(self,detailURL):try:shopURL = self.siteURL + detailURLresponse = urllib2.urlopen(shopURL)detailPage = response.read()detailPageCode = re.sub(r'<br[ ]?/?>', '\n', detailPage)return detailPageCodeexcept urllib2.URLError, e:if hasattr(e, "reason"):print e.reasonreturn None#获得项目列表def getProject(self):page = self.getPage()soup = BeautifulSoup(page, 'html.parser')projectName = []projectId = {}projects = soup.find("div", class_="anchor-nav").stripped_stringsfor project in projects:projectName.append(project)for i in range(len(projectName)):projectId[i] = projectName[i]return projectId#获得店铺链接列表def getShopHref(self):page = self.getPage()soup = BeautifulSoup(page , 'html.parser')list = soup.find_all("div", class_="poi-card clearfix")shopHref = []for items in list:shopitem = items.find_all("div", class_="item")for item in shopitem:shopHref.append(item.a['href'])return shopHref#抓取评论内容def getComment(self,page):soup = BeautifulSoup(page , 'html.parser')list = soup.find("div", class_="_j_commentlist")commentList = list.find_all("div", class_="comment-item")commentContent = []for item in commentList:commentContent.append(item.find('p').string)commentImas = item.find_all(name='img',attrs={'height':re.compile('.*?')})for commentIma in commentImas:commentContent.append(commentIma.get('src'))return  commentContent#抓取游记链接def getTravel(self,page):soup = BeautifulSoup(page, 'html.parser')items = soup.find_all("li", class_="post-item clearfix")travelHref = []for item in items:travelHref.append(item.find('a').get('href'))return travelHref#判断是否存在信息列表def hasAttr(self,page,list):soup = BeautifulSoup(page, 'html.parser')col = soup.find("div", class_="col-main").find("div", class_="bd")str_col = str(col)if list in str_col:return Trueelse:return False#抓取店铺信息def getShopInfo(self,page):shopInfoList = ['brief','localName','location', 'telephone', 'website', 'ticket', 'openTime','shopName','shopScore']infoItem = ['简介', '英文名称', '地址', '电话', '网址', '门票', '开放时间','名字','星评']soup = BeautifulSoup(page, 'html.parser')shopName = soup.find("div", class_="wrapper").h1.stringshopScore = soup.find("div", class_="col-main").span.em.stringfor i in range(0,6):if self.hasAttr(page, infoItem[i]):pattern_shopinfo = re.compile('<div class="col-main.*?<div class="bd">.*?'+ infoItem[i] +'</h3>.*?>(.*?)</p>', re.S)shopInfos = re.findall(pattern_shopinfo, page)for shopInfo in shopInfos:shopInfoList[i] = shopInfoelse:continueshopInfoList[7] = shopNameshopInfoList[8] = shopScorereturn shopInfoList#抓取保存餐厅数据def saveFood(self):f = open(r'****.txt','w')a=0for i in range(51):try:foodHrefList = self.getFoodHref(i)for foodHref in foodHrefList:page = self.getDetailPage(foodHref)dict = {}.fromkeys(('description','enname','location','telephone','website','ticket','opentime','name','rate','comment','travel'))shopInfos = self.getShopInfo(page)dict['description'] = shopInfos[0]dict['enname'] = shopInfos[1]dict['location'] = shopInfos[2]dict['telephone'] = shopInfos[3]dict['website'] = shopInfos[4]dict['ticket'] = shopInfos[5]dict['opentime'] = shopInfos[6]dict['name'] = shopInfos[7]dict['rate'] = shopInfos[8]comments = self.getComment(page)dict['comment'] = commentstravels = self.getTravel(page)dict['travel'] = travelprint json.dumps(dict,indent=1).decode("unicode_escape")print ("=================================================================================" + "\n")except AttributeError, e:continuef.close()print "抓取完成"+"共"+str(a)+"条"#输出娱乐信息def saveIntertainment(self):f = open(r'****.txt','a')f.write('\n城市:' + self.city + '\n\n\n')shopProjects = self.getProject()for i in shopProjects.keys():f.write(str(i) + str(shopProjects[i]) + '\n')shopHrefList = self.getShopHref()for shopHref in shopHrefList:try:page = self.getDetailPage(shopHref)shopInfos = self.getShopInfo(page)for shopInfo in shopInfos:f.write(str(shopInfo) + '\n')comments = self.getComment(page)for comment in comments:f.write(str(comment) + '\n')travels = self.getTravel(page)for travel in travels:f.write(str(travel) + '\n')f.write("======================================================================================================================" + '\n')except AttributeError, e:continuef.close()print "抓取完成"mfw = MFW('曼谷')mfw.saveFood()

Python爬取蚂蜂窝教程相关推荐

  1. Python 爬取蚂蜂窝旅游攻略 (+Scrapy框架+MySQL)

    前言:使用python+scrapy框架爬取蚂蜂窝旅游攻略 Git代码地址:https://github.com/qijingpei/mafengwo 获取代理IP地址的开源项目ProxyPool-m ...

  2. python爬取蚂蜂窝帖子图片

    前言 最近在学习python网络爬虫,从爬取图片入手.周末爬取了一个图标网站.果壳.数字尾巴的帖子的图片,现在尝试爬取蚂蜂窝的帖子里的图片.爬取图片仅为个人练习,侵删. 代码框架 import url ...

  3. 如何用python爬取网页数据,python爬取网页详细教程

    大家好,本文将围绕python怎么爬取网站所有网页展开说明,如何用python爬取网页数据是一个很多人都想弄明白的事情,想搞清楚python如何爬取网页数据需要先了解以下几个事情. 1.如何用Pyth ...

  4. 编程python爬取网页数据教程_实例讲解Python爬取网页数据

    一.利用webbrowser.open()打开一个网站: >>> import webbrowser >>> webbrowser.open('http://i.f ...

  5. python爬取微博图片教程_Python爬取微博实例分析

    引言 利用Ajax分析微博并爬取其内容如微博内容,点赞数,转发数,评论数等. 分析 打开陈一发微博网站:https://m.weibo.cn/p/1005051054009064,并同时打开开发者工具 ...

  6. 如何用python爬取图片数据_“python爬取微博图片教程“用Python爬虫爬取的图片怎么知道图片有没有水印...

    怎样用python爬新浪微博大V所有数据 我是个微博重度,工作之余喜欢刷刷timeline看看有什么新鲜事发也因此认识了高质量的原创大V,有分享技术资料的,比如好东西传送门:有时不时给你一点人生经验的 ...

  7. Python之 - 使用Scrapy建立一个网站抓取器,网站爬取Scrapy爬虫教程

    Scrapy是一个用于爬行网站以及在数据挖掘.信息处理和历史档案等大量应用范围内抽取结构化数据的应用程序框架,广泛用于工业. 在本文中我们将建立一个从Hacker News爬取数据的爬虫,并将数据按我 ...

  8. python制作pdf教程_学以致用:Python爬取廖大Python教程制作pdf!

    学以致用:Python爬取廖大Python教程制作pdf! python-tutorial-pdf 当我学了廖大的Python教程后,感觉总得做点什么,正好自己想随时查阅,于是就开始有了制作PDF这个 ...

  9. python爬取bilibili数据_如何使用Python爬取bilibili视频(详细教程)

    Python爬取bilibili视频 摘要 为了解决PC端的bilibili无法下载视频的问题,使用python语言可以实现一个能够爬取bilibili某个视频资源(不包括会员视频)的程序.采用整个视 ...

最新文章

  1. Datawhale浙大分享(附投票结果)
  2. 数据段、数据报、数据包、帧的区别与联系
  3. 百联全渠道联手神策数据 致力给消费者最优体验
  4. 设置centos6 yum源为光盘
  5. 数据结构 - 二叉树
  6. OAF_OAF组件系列1 - Item Style汇总(概念)
  7. 1、【转载】Python yield 使用浅析
  8. python 爬虫(一) requests+BeautifulSoup 爬取简单网页代码示例
  9. 帆软动态显示参数据控件(invisible,visible)
  10. Linux-星星之火
  11. 混合现实门户SteamVR环境下
  12. 服务器主板最多能装几个cpu,双路主板能不能只用一块CPU?
  13. .Net使用DES解密发生“数据不正确”的错误
  14. java学习总结之集合框架
  15. win2019服务器版游戏性能,微软win10发布2019年03累积更新,修复游戏和鼠标性能卡顿等问题...
  16. ElasticSearch学习
  17. C语言三方库的调用和编写
  18. 部署zinnia的问题
  19. 64位操作系统(WIN10)+32位LabVIEW(2020)+64位MySQL,Windows10系统下用32位ODBC连接MySQL
  20. 渡一教育公开课web前端开发JavaScript精英课学习笔记(六)函数及作用域

热门文章

  1. jmeter如何设置IP欺骗
  2. 建立自信心的有效方法
  3. 分界线----------
  4. 情迁QQ机器人已具备图片发送能力
  5. 20210406TestNG
  6. html js简单实现图片轮播功能
  7. 微信小程序中显示倒计时
  8. 《转自天涯一篇文章》淡出IT投身传统行业,寻求前辈帮助
  9. Android 获取手机IP地址的两种方式及常见问题
  10. 放弃北大医学院、清华定向生选择了上海交大,五年后我后悔吗