python爬虫爬取豆瓣网搜索结果同城活动数据

主要使用的库：

requests:爬虫请求并获取源码

re：使用正则表达式提取数据

json:使用JSON提取数据

pandas：使用pandans存储数据

bs4:网页代码解析

以下是源代码：

#!coding=utf-8
import requests
import re
import json
from requests.packages.urllib3.exceptions import InsecureRequestWarning
import pandas as pd
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
from bs4 import BeautifulSoup
import lxmldef doubanhudong(q,cat):  ###q  查询内容，cat 目录编号s = requests.session()headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'zh-CN,zh;q=0.9','Cache-Control': 'max-age=0','Connection': 'keep-alive','Host': 'www.douban.com','Upgrade-Insecure-Requests': '1','X-Requested-': 'XMLHttpRequest',#'Referer': 'https://www.douban.com/search?cat={}&q={}'.format(cat,q),'Referer': 'https://www.douban.com','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.15 Safari/537.36',}s.headers.update(headers)activity = []  ##频道名title = []   ##标题html = []  ##网址peoplenum = []  ##参加人数date = []  ##活动时间address1 = [] ##省address2 = []  ##城市address3 = []  ##详细地址abstract = []  ##概述for i in range(0,100000,20):url='https://www.douban.com/j/search?q={}&start={}&cat={}'.format(q,i,cat)req=s.get(url=url,verify=False)req=json.loads(req.text)# bs=BeautifulSoup(req,'lxml')# resultlist=bs.find_all(class_='result')bs=req['items']if bs==[]:breakfor r in bs:print(r)i=BeautifulSoup(r,'lxml')activity.append(i.find('span').get_text().strip())html.append(i.find('a')['href'].strip())title.append(i.find('a')['title'].strip())peoplenum.append(i.find(class_='info').get_text().strip().split('\n')[0])date.append(i.find(class_='info').get_text().strip().split('\n')[2].strip())address1.append(i.find(class_='info').get_text().strip().split('\n')[4].strip())address2.append(i.find(class_='info').get_text().strip().split('\n')[5].strip())address3.append(i.find(class_='info').get_text().strip().split('\n')[6].strip())try:abstract.append(i.find('p').get_text().strip())except:abstract.append('')# print(activity)# print(title)# print(html)# print(peoplenum)# print(date)# print(address1)# print(address2)# print(address3)# print(abstract)data={'activity':activity,'title':title,'html':html,'peoplenum':peoplenum,'date':date,'address1':address1,'address2':address2,'address3':address3,'abstract':abstract}df=pd.DataFrame(data)df.to_csv(r'E:\doubai.csv', index=False, encoding="GB18030")if __name__ == '__main__':q='电影'  ##搜索词cat='1011'  ##频道编号doubanhudong(q,cat)

注：参数q为搜索词，cat为豆瓣频道编号，同城活动为1011

python爬虫爬取豆瓣网搜索结果同城活动数据相关推荐

python爬虫爬取淘宝搜索页面商品信息数据
主要使用的库: requests:爬虫请求并获取源码 re:使用正则表达式提取数据 json:使用JSON提取数据 pandas:使用pandans存储数据以下是源代码: #!coding=utf- ...
Python爬虫爬取豆瓣电影TOP250
Python爬虫爬取豆瓣电影TOP250 最近在b站上学习了一下python的爬虫,实践爬取豆瓣的电影top250,现在对这两天的学习进行一下总结主要分为三步: 爬取豆瓣top250的网页,并通过 ...
在当当买了python怎么下载源代码-python爬虫爬取当当网
[实例简介]python爬虫爬取当当网 [实例截图] [核心代码] ''' Function: 当当网图书爬虫 Author: Charles 微信公众号: Charles的皮卡丘 ''' impor ...
python爬虫爬取豆瓣电影排行榜并通过pandas保存到Excel文件当中
我们的需求是利用python爬虫爬取豆瓣电影排行榜数据,并将数据通过pandas保存到Excel文件当中(步骤详细) 我们用到的第三方库如下所示: import requests import pan ...
python爬虫爬取当当网的商品信息
python爬虫爬取当当网的商品信息一.环境搭建二.简介三.当当网网页分析 1.分析网页的url规律 2.解析网页html页面书籍商品html页面解析其他商品html页面解析四.代码实现 ...
Python爬虫爬取豆瓣电影评论内容，评论时间和评论人
Python爬虫爬取豆瓣电影评论内容,评论时间和评论人我们可以看到影评比较长,需要展开才能完整显示.但是在网页源码中是没有显示完整影评的.所以我们考虑到这部分应该是异步加载的方式显示.所以打开网页的 ...
python爬虫爬取知网
python爬虫爬取知网话不多说,直接上代码! import requests import re import time import xlrd from xlrd import open_wor ...
python爬虫爬取豆瓣读书Top250
python爬虫爬取豆瓣读书Top250 话不多说,直接上代码! from urllib.request import urlopen, Request from bs4 import Beautif ...
[python爬虫]爬取天气网全国所有县市的天气数据
[python爬虫]爬取天气网全国所有县市的天气数据访问URL 解析数据保存数据所要用到的库 import requests from lxml import etree import xlwt ...

python爬虫爬取豆瓣网搜索结果同城活动数据

python爬虫爬取豆瓣网搜索结果同城活动数据相关推荐

最新文章

热门文章

python爬虫 爬取 豆瓣网 搜索结果 同城活动 数据

python爬虫 爬取 豆瓣网 搜索结果 同城活动 数据相关推荐

最新文章

热门文章

python爬虫爬取豆瓣网搜索结果同城活动数据

python爬虫爬取豆瓣网搜索结果同城活动数据相关推荐