Python爬取蚂蜂窝教程

　　最近因为项目需要，就去了解学习了Python爬虫的一些知识，并在此分享出学习过程中的难题和经验。
　　先看最终程序输出

　　{"website": "<a href="http://www.somboonseafood.com/" target="_blank" rel="nofollow">http://www.somboonseafood.com/</a>", "comment": ["进去里面已经人满为患，服务生来往都是急匆匆的。我们前面还有一桌外国人在等位子。好在等待的时间不长，很快我们被带到了二楼。菜单上有中英文的翻译。我们除了必点的咖喱蟹，还点了腰果鸡肉，酸辣鱿鱼，芒果糯米饭和冬阴功汤。建兴比较好的是菜品都有小份的，适合2人吃的。这顿饭具体花了多少泰铢不记得了，反正折合人民币二百多吧。他家不能拉卡，只能付现金哦~", "http://b3-q.mafengwo.net/s8/M00/4B/D5/wKgBpVXxM4aAdrXbACreEebl8Ug36.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://a1-q.mafengwo.net/s8/M00/4B/E8/wKgBpVXxM5GAD-uAAAuCjK25BIo42.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://n3-q.mafengwo.net/s8/M00/4B/EC/wKgBpVXxM5KAMPIxAAz_DjXUweA78.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "我们四个人点了红油咖喱蟹，粉丝闷虾，炒含羞草，还有芒果汁，柠檬汁。咖喱蟹很好吃，炒的很香很入味，如果将那红油用来拌饭，味道肯定很赞；粉丝闷虾也不错，四个人吃刚刚好；含羞草就有点老了，除此之外还有个酱油蒸石斑鱼，按斤卖的，一条快一千多了，不过肉质很劲道，吃多来还能塞牙缝呢，真的很新鲜", "http://a1-q.mafengwo.net/s8/M00/FD/32/wKgBpVXsL3eAb2oLAAs12tssU2Y97.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://c3-q.mafengwo.net/s8/M00/FD/3C/wKgBpVXsL4OAOf0zAAjok0qVt-406.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://c1-q.mafengwo.net/s8/M00/FD/46/wKgBpVXsL5CAT3kGAAn4-5VHSAg78.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "咖喱螃蟹不错，就是螃蟹少了鸡蛋多了哈哈哈，感觉最好吃的是我们随便点的虾子，炸得超级脆然后上面裹的粉好好吃。三个菜加一瓶矿泉水1000多株，感觉有点小贵，因为感觉没有传说中的那么那么好吃哈哈哈", "http://a2-q.mafengwo.net/s8/M00/78/4D/wKgBpVXYk4yAV9M9ABim-ixW7lg98.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://a2-q.mafengwo.net/s8/M00/78/52/wKgBpVXYk5CAbjcKABwM1hcyCZU62.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://b2-q.mafengwo.net/s8/M00/78/57/wKgBpVXYk5SAcFR0ABsEFh2YADQ90.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "他们这边的咖喱跟我们平时吃的不一样，偏甜一点！", "这一顿才化了1000B多点，这里是不能刷卡的，所以记得带好现金再去！", "http://b1-q.mafengwo.net/s8/M00/FE/BC/wKgBpVXdJmiAE16jAAW6XZkem8k36.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://n3-q.mafengwo.net/s8/M00/FE/97/wKgBpVXdJlWAVXipAAGLzCC7YP400.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://n1-q.mafengwo.net/s8/M00/FE/DD/wKgBpVXdJoCAaODiAAbf-o77ojA68.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "这顿饭是在曼谷吃的最贵的一餐，总共705铢。这家餐馆的味道也没有想象中多惊艳啦，发现其实泰国随便一家路边的拍档做的泰国菜味道都可以的。", "http://n2-q.mafengwo.net/s8/M00/14/52/wKgBpVXVzzKAS4rrAA2AHp_Mk3w39.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://a2-q.mafengwo.net/s8/M00/14/55/wKgBpVXVzzaAUAa8AAr8LPWGCSA46.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://c3-q.mafengwo.net/s8/M00/14/5A/wKgBpVXVzzmAd_D9AAuDW5yUmOI43.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "在各大攻略了声名显赫的他果然火爆，下午3点了还是排长龙！建议大家一定事先打电话预约哦！招牌菜咖喱蟹还行吧，总体上比其他泰餐还是强，但价格也确实不便宜。", "不过个人觉得是又贵又没有特色，连姐妹说的好吃到炸的咖哩蟹我个人觉得也没有米特拉的好吃，还不如一株粥（接下来会提到）。反正不建议去，当然也可以尝试一下被宰的感觉。", "http://n3-q.mafengwo.net/s8/M00/D0/5D/wKgBpVXKGSmAFS9sAAPmXxP4odQ66.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "建兴酒家的菜也还行，在泰国物价里感觉应该不算便宜，尤其都是海鲜对比在普吉岛吃过的东西，一个天上一个地下。泰国特色咖喱蟹，口味跟日本咖喱不同，椰浆味道比较重，两人吃少要点就行，配泰国香米。", "http://a2-q.mafengwo.net/s8/M00/7E/32/wKgBpVXJz5qAZmKMAAJIskX2Kdw52.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "出发之前就知道建兴酒家很出名，可是一直以为离我们很远，不方便去吃。偶然发现原来SIAM站也有，但是位置真的很不好找，谷人希都找不到，已经被Siam Paragon,SiamCenter,Siam Discovery搞混乱了，当时已经饿的不行了，皇天不负有心人，终于还是找到它了，晚上六点多一点，门外已经两排凳子，排排坐了。记住是SIAM SQUARE ONE，大家去之前请做好功课，在SIAM CENTER的对面。而且最后就提前预约一下，我们等了快一个小时就才有位置，也许是享受美食，很久才会走一台。在等待的时候就已经把餐盘翻穿了，一坐下，不用等待，立刻点餐。完全忘记我们只有两个人在作战！除了海鲜拼盘不好吃，其他都一级棒！海鲜拼盘的那个蘸料太奇怪了，又酸又辣还是绿色的。", "http://n3-q.mafengwo.net/s8/M00/A8/40/wKgBpVXJ_ZmAeOxBAAjwGDzYEXg22.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://b3-q.mafengwo.net/s8/M00/A8/5A/wKgBpVXJ_a-ACPTQAAu-qwezLm881.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://b1-q.mafengwo.net/s8/M00/A8/6A/wKgBpVXJ_byAUD0UAArQl-EOgLw63.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "主打菜是咖喱蟹，确实很不错。味道偏甜，多吃会腻。", "建兴酒家泰国菜比较正宗，海鲜很新鲜。点着那泰式咖喱蟹，大虾冬阴功，不知名的某鱼还有什么泰式的蔬菜等，一边吃的欢，一边感慨：跟着攻略走，果然美味不会错！等到结账买单时，服务员上来账单一看，5800多铢，傻眼了！", "建兴酒家（CENTRAL AMBASSY店）咖喱蟹真是太棒了，咖喱蟹加了蛋黄，非常好吃。", "http://n3-q.mafengwo.net/s8/M00/ED/38/wKgBpVXFfluASZmjAAIP8zhLkns30.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://n3-q.mafengwo.net/s8/M00/ED/6B/wKgBpVXFfoaARyp6AAG0hzU8HJY32.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "http://n2-q.mafengwo.net/s8/M00/E6/A1/wKgBpVXB97WASTNwAAsINW94q5M85.jpeg?imageMogr2%2Fthumbnail%2F%21200x150r%2Fgravity%2FCenter%2Fcrop%2F%21200x150%2Fquality%2F90", "建兴酒家的咖喱蟹确实好吃，也不贵。", "传说中的咖喱蟹出名的酒家。游记提到说不要轻易打的告诉司机去建兴酒家，因为可能会带你去山寨店，然后狠狠的砍。所以，请提前上建兴酒家的官网查询好具体地址，然后查找好附近的BTS，自行解决吧！价格略贵，不过味道很好！咖喱蟹诚心推荐。"], "opentime": "openTime", "description": "由华人创立的建兴酒家是曼谷一家老字号的海鲜餐厅，烹饪融合粤菜和泰国菜技法，国人比较容易接受。咖喱蟹是这里的招牌菜，炒含羞草、粉丝虾煲、蒜蓉虾也是这里的推荐菜。建兴酒家在曼谷有七家店，其中Samyan、Central Embassy、Siam Square One店是中午营业的，其他店的营业时间都是16:00-23:30。", "travel": ["/i/1008978.html", "/i/832751.html", "/i/1096198.html", "/i/811283.html", "/i/881575.html", "/i/885891.html", "/i/850595.html", "/i/962279.html", "/i/1058436.html", "/i/1250020.html", "/i/1290693.html", "/i/1285795.html", "/i/1161749.html", "/i/1078420.html", "/i/1136733.html", "/i/1008978.html", "/i/832751.html", "/i/1096198.html"], "telephone": "(66-02)2333104", "rate": "4.1", "location": "169, 169/7-12 Surawong Rd., Suriyawong, Bangrak, Bangkok 10500", "ticket": "ticket", "enname": "Somboon Seafood", "name": "建兴酒家(Surawong店) "
}

PythonIDE选择及安装

在PythonIDE选择方面，我选择的是Pycharm，很方便快捷，下载地址：
[http://www.jetbrains.com/pycharm/download/](http://www.jetbrains.com/pycharm/download/)
PyCharm 的激活方式：1，推荐购买正版。2，可以选择试用，免费试用30天。3，网上找激活码：
（下面的激活码来自互联网，仅供学习交流之用）user name: EMBRACEkey:14203-120420100000107Iq75C621P7X1SFnpJDivKnX6zcwYOYaGK3euO3ehd1MiTT"2!Jny8bff9VcTSJk7sRDLqKRVz1XGKbMqw3G

正则表达式　　

在学习爬虫之前还要有正则表达式的基础，这里贴出正则表达式的基本符号含义，
　　
　　用的比较多的就是\d(数字)、\w(单词)、\W(非单词)、.、*、?、+

需要的库如下：

re
urllib2
BeautifulSoup
json

urllib2库用来抓取页面的html代码，在此之上可用re进行正则匹配，或BeautifulSoup进行匹配，最后匹配数据保存为json格式。

爬虫代码分析

我们首先需要爬取得页面为

我们可以看到url为www.mafengwo.cn/group/s.php?q=曼谷&p=1&t=cate&kt=1。主要参数有q ,p ,t,其中q为城市名，p为页码，t为分类，cate为美食，kt为不影响参数。

需要获取该页面，detail为域名以后的参数，这个函数可以用于获得域名主页下的网页

      #获取下级页面def getDetailPage(detailURL):try:url = "http://www.mafengwo.cn"+detailURL"request = urllib2.Request(url)response = urllib2.urlopen(request)#利用urllib2的Request方法返回一个request对象并用urlopen打开page = response.read()#用read()方法读取页面内容，Input: print page Output: 页面htmlpageCode = re.sub(r'<br[ ]?/?>', '\n', page)#去掉html里的回车空行return pageCodeexcept urllib2.URLError, e:if hasattr(e, "reason"):print e.reasonreturn None

获得每家美食店铺的链接,首先进行元素检查查看链接位于的位置

    #获得美食单页店铺链接def getFoodHref(self,pageid):url = "/group/s.php?q="+self.city+"&p=" +str(pageid)+ "&t=cate&kt=1"page = getDetailPage(url)#调用getDetailPage获得页面soup = BeautifulSoup(page,'html.parser')#用BeautifulSoup进行页面解析FoodHref = []FoodLists =  soup.find(name="div",attrs={'data-category':'poi'}).ulFoodHrefList = FoodLists.find_all("h3")#找出<div class="_j_search_section" data-category="poi">标签下所有的<h3>标签的内容，结果为店铺列表的htmlfor FoodHrefs in FoodHrefList:FoodWebsite = FoodHrefs.a['href']#对列表循环找出a标签href属性的值，即为店铺的urlFoodHrefShort = str(FoodWebsite).replace('http://www.mafengwo.cn','')#去掉url前的域名，以便等会调用getDetaiL函数，传入它获得店铺页面FoodHref.append(FoodHrefShort)return FoodHref

接下来再次调用getDetailPage(),传入FoodHref,即可可以获得店铺的页面，通过BeautifulSoup进行信息获取了。但我在抓取的时候遇到一个问题。

　　这是一个信息齐全的店铺，但有的店铺没有网址，没有交通信息该怎么办。比如这个

经过元素检查发现标签也是一样的，无法通过标签特有的属性或者class的值进行定向抓取。用<div class="bd">的子节点兄弟节点查也不行。后来想出一个方法。

先写一个匹配函数hasAttr,list参数为一个中文的完整信息名列表，在getShopInfo方法里通过循环列表内容与抓取的<div class="bd">标签内容匹配，如果返回True则表示存在该信息项，否则继续匹配下一项。比如上面的图，先匹配简介，匹配失败，继续匹配英文名字，也失败，知道匹配到地址，成功，保存地址下一个标签的内容。直到获得所有信息。

        #判断是否存在信息列表def hasAttr(self,list):soup = BeautifulSoup(page, 'html.parser')col = soup.find("div", class_="col-main").find("div", class_="bd")str_col = str(col)if list in str_col:return Trueelse:return False#抓取店铺信息def getShopInfo(self,page):shopInfoList = ['brief','localName','location', 'telephone', 'website', 'ticket', 'openTime','shopName','shopScore']infoItem = ['简介', '英文名称', '地址', '电话', '网址', '门票', '开放时间','名字','星评']soup = BeautifulSoup(page, 'html.parser')shopName = soup.find("div", class_="wrapper").h1.stringshopScore = soup.find("div", class_="col-main").span.em.stringfor i in range(0,6):#信息项循环查找if self.hasAttr(page, infoItem[i]):pattern_shopinfo = re.compile('<div class="col-main.*?<div class="bd">.*?'+ infoItem[i] +'</h3>.*?>(.*?)</p>', re.S)shopInfos = re.findall(pattern_shopinfo, page)#存在该项则用正则取出其标签内容for shopInfo in shopInfos:shopInfoList[i] = shopInfoelse:#继续查找下一项continueshopInfoList[7] = shopNameshopInfoList[8] = shopScorereturn shopInfoList

最后将数据加入字典，如果一键对多值，比如dict = {a:[]},调用set default(键名,[]).append(列表值)
dict.setdefault('comment',[]).appnd(comment)

然后json.dumps(dict,indent=1).decode("unicode_escape")。indent参数是为了以json树形式表现出数据，如果内容中有中文要用decode("unicode_escape"),否则结果为”\u”的unicode编码

贴出完整代码，可通过修改最后实例MFW()内参数来改变城市名，通过修改函数saveFood()或者saveIntertainment()来分别获取该城市的美食与娱乐信息。

    #coding:utf-8import reimport urllib2from bs4 import  BeautifulSoupimport jsonimport sysreload(sys)sys.setdefaultencoding('utf-8')class MFW:def __init__(self,city):self.siteURL = 'http://www.mafengwo.cn'self.city = cityself.cityDict = {'曼谷': '11045_518', '清迈': '15284_179', '普吉岛': '11047_858', '苏梅': '14210_686', '芭堤雅': '11046_940'}self.id = self.cityDict[self.city]self.user_agent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"self.headers = { 'User-Agent' :self.user_agent}#获得美食单页店铺链接def getFoodHref(self,pageid):url = "/group/s.php?q="+self.city+"&p=" +str(pageid)+ "&t=cate&kt=1"page = self.getDetailPage(url)soup = BeautifulSoup(page,'html.parser')FoodHref = []FoodLists =  soup.find(name="div",attrs={'data-category':'poi'}).ulFoodHrefList = FoodLists.find_all("h3")for FoodHrefs in FoodHrefList:FoodWebsite = FoodHrefs.a['href']FoodHrefShort = str(FoodWebsite).replace('http://www.mafengwo.cn','')FoodHref.append(FoodHrefShort)return FoodHref#获得旅店链接def getHotelHref(self, pageid):url = "/group/s.php?q=" + self.city + "&p=" + str(pageid) + "&t=hotel&kt=1"page = self.getDetailPage(url)soup = BeautifulSoup(page, 'html.parser')hotelHref = []hotelHrefLists = soup.find_all("div",class_="hot-about clearfix _j_hotel")for hotelHrefList in hotelHrefLists:hotelWebsite= hotelHrefList.a['href']hotelHrefShort = str(hotelWebsite).replace('http://www.mafengwo.cn', '')hotelHref.append(hotelHrefShort)return hotelHref#获得页面HTMLdef getPage(self):try:url = self.siteURL+"/baike/"+str(self.id)+".html"request = urllib2.Request(url, headers=self.headers)response = urllib2.urlopen(request)page = response.read()pageCode = re.sub(r'<br[ ]?/?>', '\n', page)return pageCodeexcept urllib2.URLError, e:if hasattr(e, "reason"):print e.reasonreturn None#获得下级WEB页面HTMLdef getDetailPage(self,detailURL):try:shopURL = self.siteURL + detailURLresponse = urllib2.urlopen(shopURL)detailPage = response.read()detailPageCode = re.sub(r'<br[ ]?/?>', '\n', detailPage)return detailPageCodeexcept urllib2.URLError, e:if hasattr(e, "reason"):print e.reasonreturn None#获得项目列表def getProject(self):page = self.getPage()soup = BeautifulSoup(page, 'html.parser')projectName = []projectId = {}projects = soup.find("div", class_="anchor-nav").stripped_stringsfor project in projects:projectName.append(project)for i in range(len(projectName)):projectId[i] = projectName[i]return projectId#获得店铺链接列表def getShopHref(self):page = self.getPage()soup = BeautifulSoup(page , 'html.parser')list = soup.find_all("div", class_="poi-card clearfix")shopHref = []for items in list:shopitem = items.find_all("div", class_="item")for item in shopitem:shopHref.append(item.a['href'])return shopHref#抓取评论内容def getComment(self,page):soup = BeautifulSoup(page , 'html.parser')list = soup.find("div", class_="_j_commentlist")commentList = list.find_all("div", class_="comment-item")commentContent = []for item in commentList:commentContent.append(item.find('p').string)commentImas = item.find_all(name='img',attrs={'height':re.compile('.*?')})for commentIma in commentImas:commentContent.append(commentIma.get('src'))return  commentContent#抓取游记链接def getTravel(self,page):soup = BeautifulSoup(page, 'html.parser')items = soup.find_all("li", class_="post-item clearfix")travelHref = []for item in items:travelHref.append(item.find('a').get('href'))return travelHref#判断是否存在信息列表def hasAttr(self,page,list):soup = BeautifulSoup(page, 'html.parser')col = soup.find("div", class_="col-main").find("div", class_="bd")str_col = str(col)if list in str_col:return Trueelse:return False#抓取店铺信息def getShopInfo(self,page):shopInfoList = ['brief','localName','location', 'telephone', 'website', 'ticket', 'openTime','shopName','shopScore']infoItem = ['简介', '英文名称', '地址', '电话', '网址', '门票', '开放时间','名字','星评']soup = BeautifulSoup(page, 'html.parser')shopName = soup.find("div", class_="wrapper").h1.stringshopScore = soup.find("div", class_="col-main").span.em.stringfor i in range(0,6):if self.hasAttr(page, infoItem[i]):pattern_shopinfo = re.compile('<div class="col-main.*?<div class="bd">.*?'+ infoItem[i] +'</h3>.*?>(.*?)</p>', re.S)shopInfos = re.findall(pattern_shopinfo, page)for shopInfo in shopInfos:shopInfoList[i] = shopInfoelse:continueshopInfoList[7] = shopNameshopInfoList[8] = shopScorereturn shopInfoList#抓取保存餐厅数据def saveFood(self):f = open(r'****.txt','w')a=0for i in range(51):try:foodHrefList = self.getFoodHref(i)for foodHref in foodHrefList:page = self.getDetailPage(foodHref)dict = {}.fromkeys(('description','enname','location','telephone','website','ticket','opentime','name','rate','comment','travel'))shopInfos = self.getShopInfo(page)dict['description'] = shopInfos[0]dict['enname'] = shopInfos[1]dict['location'] = shopInfos[2]dict['telephone'] = shopInfos[3]dict['website'] = shopInfos[4]dict['ticket'] = shopInfos[5]dict['opentime'] = shopInfos[6]dict['name'] = shopInfos[7]dict['rate'] = shopInfos[8]comments = self.getComment(page)dict['comment'] = commentstravels = self.getTravel(page)dict['travel'] = travelprint json.dumps(dict,indent=1).decode("unicode_escape")print ("=================================================================================" + "\n")except AttributeError, e:continuef.close()print "抓取完成"+"共"+str(a)+"条"#输出娱乐信息def saveIntertainment(self):f = open(r'****.txt','a')f.write('\n城市:' + self.city + '\n\n\n')shopProjects = self.getProject()for i in shopProjects.keys():f.write(str(i) + str(shopProjects[i]) + '\n')shopHrefList = self.getShopHref()for shopHref in shopHrefList:try:page = self.getDetailPage(shopHref)shopInfos = self.getShopInfo(page)for shopInfo in shopInfos:f.write(str(shopInfo) + '\n')comments = self.getComment(page)for comment in comments:f.write(str(comment) + '\n')travels = self.getTravel(page)for travel in travels:f.write(str(travel) + '\n')f.write("======================================================================================================================" + '\n')except AttributeError, e:continuef.close()print "抓取完成"mfw = MFW('曼谷')mfw.saveFood()