使用requests爬取豆瓣电视
暴力点直接上代码
爬取的过程中发现不涉及cookie和ip的反爬虫策略,数据量大的部分使用协程挺快的!
import json
import requests
import gevent
from gevent import monkeymonkey.patch_socket()class DoubanTVSpider:def __init__(self):self.base_url = "https://movie.douban.com/j/search_subjects"self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36",}self.page_limit = 20def parse_url(self, url, params):response = requests.get(url, params=params, headers=self.headers)assert response.status_code == 200return response.content.decode()def get_content_list(self, json_str):dict_data = json.loads(json_str)print(dict_data)count = len(dict_data["subjects"])return dict_data, countdef save_cotent_list(self, content_list, tag):content_list.update({"tag": tag})with open("doubanTV.json", "a", encoding="utf-8") as f:f.write(json.dumps(content_list, ensure_ascii=False, indent=4))f.write(";")print("OK")def _run(self, tag="热门"):params = {"type": "tv","tag": tag,"sort": "recommend","page_limit": 20,"page_start": 0}print(tag)while True:json_str = self.parse_url(self.base_url, params)content_list, count = self.get_content_list(json_str)self.save_cotent_list(content_list, params['tag'])if count < self.page_limit:# 到达尾部,退出爬虫程序breakparams['page_start'] = params['page_start'] + 20def run(self):print("""请输入你的选择:0: "热门"
1: "美剧"
2: "英剧"
3: "韩剧"
4: "日剧"
5: "国产剧"
6: "港剧"
7: "日本动画"
8: "综艺"
9: "纪录片"
10:以上所有
11.退出系统""")select = input()switch_list = {"0": "热门","1": "美剧","2": "英剧","3": "韩剧","4": "日剧","5": "国产剧","6": "港剧","7": "日本动画","8": "综艺","9": "纪录片",}if select == "11":exit()elif select == "10":gevent.joinall([gevent.spawn(self._run, switch_list[i]) for i in switch_list])else:self._run(switch_list[select])if __name__ == "__main__":doubanTV_spider = DoubanTVSpider()doubanTV_spider.run()
爬取下来的结果(全部太多了,只放部分)
"title": "我们不能是朋友","url": "https://movie.douban.com/subject/30309331/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2557149941.jpg","id": "30309331","cover_y": 916,"is_new": false},{"rate": "6.8","cover_x": 1432,"title": "少年派","url": "https://movie.douban.com/subject/27598254/","playable": true,"cover": "https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2558828119.jpg","id": "27598254","cover_y": 2048,"is_new": false},{"rate": "7.7","cover_x": 1000,"title": "动物管理局","url": "https://movie.douban.com/subject/27107725/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2522426850.jpg","id": "27107725","cover_y": 1419,"is_new": false},{"rate": "9.5","cover_x": 1204,"title": "我们与恶的距离","url": "https://movie.douban.com/subject/30181230/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2554916825.jpg","id": "30181230","cover_y": 1720,"is_new": false},{"rate": "3.7","cover_x": 1071,"title": "带着爸爸去留学","url": "https://movie.douban.com/subject/30238247/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2540911433.jpg","id": "30238247","cover_y": 1500,"is_new": false},{"rate": "7.8","cover_x": 6732,"title": "大宋少年志","url": "https://movie.douban.com/subject/30170894/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2559266281.jpg","id": "30170894","cover_y": 11968,"is_new": false},{"rate": "9.2","cover_x": 600,"title": "大小谎言 第二季","url": "https://movie.douban.com/subject/27195401/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2558544696.jpg","id": "27195401","cover_y": 889,"is_new": false},{"rate": "8.8","cover_x": 750,"title": "请输入搜索词:WWW","url": "https://movie.douban.com/subject/30403333/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2556419121.jpg","id": "30403333","cover_y": 1062,"is_new": false},{"rate": "8.7","cover_x": 803,"title": "春夜","url": "https://movie.douban.com/subject/30428225/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2554598542.jpg","id": "30428225","cover_y": 1200,"is_new": false},{"rate": "9.4","cover_x": 1080,"title": "这!就是街舞 第二季","url": "https://movie.douban.com/subject/30486671/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2552553650.jpg","id": "30486671","cover_y": 1566,"is_new": false},{"rate": "8.8","cover_x": 1457,"title": "好兆头","url": "https://movie.douban.com/subject/26846856/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2558290974.jpg","id": "26846856","cover_y": 2159,"is_new": false},{"rate": "7.0","cover_x": 770,"title": "暗恋橘生淮南","url": "https://movie.douban.com/subject/26811775/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2559173820.jpg","id": "26811775","cover_y": 1080,"is_new": false},{"rate": "8.4","cover_x": 800,"title": "吹落的树叶","url": "https://movie.douban.com/subject/30438479/","playable": false,"cover": "https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2550315269.jpg","id": "30438479","cover_y": 1131,"is_new": false},{"rate": "8.6","cover_x": 945,"title": "白色强人","url": "https://movie.douban.com/subject/27195042/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2559049945.jpg","id": "27195042","cover_y": 1350,"is_new": false},{"rate": "7.1","cover_x": 1080,"title": "破冰行动","url": "https://movie.douban.com/subject/27052168/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2556232270.jpg","id": "27052168","cover_y": 1920,"is_new": false}],"tag": "热门"
使用requests爬取豆瓣电视相关推荐
- day02:requests爬取豆瓣电影信息
一.requests爬取豆瓣电影信息 -请求url: https://movie.douban.com/top250 -请求方式: get -请求头: user-agent cookies二.sele ...
- requests爬取豆瓣前250部高分电影
这两天又写了一个爬取豆瓣前250部高分电影的爬虫,并把电影名字和图片保存到本地. 用的是requests和BeautifulSoup. @requires_authorization import r ...
- 【爬虫案例】Requests爬取豆瓣短评以及入门爬虫注意事项
一.Requests是什么? Requests 是一个 Python 的 HTTP 客户端库. 支持的 HTTP 特性: 保持活动和连接池.国际域名和URL.Cookie持久性会话 浏览器式SSL验证 ...
- python selenium爬虫豆瓣_使用selenium+requests爬取豆瓣小组讨论列表
获取本文代码 · 我的GitHub 注:这个项目的代码会在我的GitHub持续优化.更新,而在本文中的代码则是最初版本的代码. 豆瓣小组 豆瓣有一个"小组"模块,有一些小组中会发布 ...
- 【Python 爬虫】(二)使用 Requests 爬取豆瓣短评
文章目录 Requests库介绍 Requests库安装 Requests库的简单用法 实战 爬虫协议 Requests库介绍 Requests库官方的介绍有这么一句话:Requests,唯一的一个非 ...
- python爬取歌曲评论_python+requests爬取豆瓣歌曲评论
1 #-*- coding: utf-8 -*- 2 ''' 3 Created on 2018年8月14日4 5 @author: zww6 7 ''' 8 importtime9 importre ...
- Requests:爬取豆瓣排名前250的电影名称
import requests from bs4 import BeautifulSoup import pandas as pd # 读取豆瓣TOP250的电影名 def get_movies(): ...
- 使用requests爬取豆瓣电影top250
今天使用requests获取豆瓣电影top250的内容,先说下思路 表设计 获取每一页的内容 获取当前页每一部电影对应的链接 ##获取所有页的url for i in range(10):top_ur ...
- python爬虫实践之爬取豆瓣高评分电影
目录 概述 准备 所需模块 涉及知识点 运行效果 完成爬虫 1. 分析网页 2. 爬虫代码 3. 整理总结 概述 爬取豆瓣的高评分的电影. 准备 所需模块 re模块 requests模块 涉及知识点 ...
最新文章
- redis setnx 原子性_Redis从入门到深入-分布式锁(26)
- zabbix企业应用之报表功能
- CSS中z-index全解析
- 【Python学习系列十二】Python库pandas之CSV导入
- 深度强化学习落地指南:弥合DRL算法原理和落地实践之间的断层 | 文末送书
- Git代码同时push到GitHub和Gitee(码云)
- python全栈开发_day10_函数的实参和形参
- python 环境常用指令(updating...)
- c++ 实现录音并且指定到文件_搜狗发布四款AI录音笔,4大核心功能开启AI录音新时代...
- Qt——P12 信号连接信号
- 查找数组中被删除的一个元素
- 工程思想——关于串口通讯协议帧数据的一些想法
- 博客也是网络赚钱的有利工具
- JS 获取WEB请求路径
- Docker 持续推动创新,三款应用为您指引未来趋势
- Intel CMT CAT CDP 技术应用
- imx6ul pinctrl 驱动浅析
- 微信公众号在开发者模式下自定义菜单
- Python编程基础及应用
- 赵小楼《天道》《遥远的救世主》深度解析(51)为什么芮小丹用脱衣这个方式来告白丁元英?
热门文章
- 智能化安防视频监控行业发展是如何转变的?
- 氮化镓 服务器电源管理系统报价,基于LMG341x GaN FET的服务器电源单元(PSU)电路设计...
- linux双系统如何选择顺序,双系统中怎么设置Ubuntu多系统的默认启动顺序?
- 如何有效地学习知识,如何才能全面发展?
- Qt 动态实时显示波形图
- 新华三:助力IPv6部署,我们责无旁贷
- 网易七鱼“大闹”客服行业,真能一举定乾坤?
- 修改apt-get服务器,云服务器使用sudo apt-get update 失败的原因及解决方法(unubtu下)...
- JavaScript和html5 canvas生成圆形印章
- CentOS6.5下搭建LAMP+FreeRadius+Daloradius Web管理和TP-LINK路由器、H3C交换机连接,实现,上网认证和记账功能