暴力点直接上代码

爬取的过程中发现不涉及cookie和ip的反爬虫策略,数据量大的部分使用协程挺快的!

import json
import requests
import gevent
from gevent import monkeymonkey.patch_socket()class DoubanTVSpider:def __init__(self):self.base_url = "https://movie.douban.com/j/search_subjects"self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36",}self.page_limit = 20def parse_url(self, url, params):response = requests.get(url, params=params, headers=self.headers)assert response.status_code == 200return response.content.decode()def get_content_list(self, json_str):dict_data = json.loads(json_str)print(dict_data)count = len(dict_data["subjects"])return dict_data, countdef save_cotent_list(self, content_list, tag):content_list.update({"tag": tag})with open("doubanTV.json", "a", encoding="utf-8") as f:f.write(json.dumps(content_list, ensure_ascii=False, indent=4))f.write(";")print("OK")def _run(self, tag="热门"):params = {"type": "tv","tag": tag,"sort": "recommend","page_limit": 20,"page_start": 0}print(tag)while True:json_str = self.parse_url(self.base_url, params)content_list, count = self.get_content_list(json_str)self.save_cotent_list(content_list, params['tag'])if count < self.page_limit:# 到达尾部,退出爬虫程序breakparams['page_start'] = params['page_start'] + 20def run(self):print("""请输入你的选择:0: "热门"
1: "美剧"
2: "英剧"
3: "韩剧"
4: "日剧"
5: "国产剧"
6: "港剧"
7: "日本动画"
8: "综艺"
9: "纪录片"
10:以上所有
11.退出系统""")select = input()switch_list = {"0": "热门","1": "美剧","2": "英剧","3": "韩剧","4": "日剧","5": "国产剧","6": "港剧","7": "日本动画","8": "综艺","9": "纪录片",}if select == "11":exit()elif select == "10":gevent.joinall([gevent.spawn(self._run, switch_list[i]) for i in switch_list])else:self._run(switch_list[select])if __name__ == "__main__":doubanTV_spider = DoubanTVSpider()doubanTV_spider.run()

爬取下来的结果(全部太多了,只放部分)

"title": "我们不能是朋友","url": "https://movie.douban.com/subject/30309331/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2557149941.jpg","id": "30309331","cover_y": 916,"is_new": false},{"rate": "6.8","cover_x": 1432,"title": "少年派","url": "https://movie.douban.com/subject/27598254/","playable": true,"cover": "https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2558828119.jpg","id": "27598254","cover_y": 2048,"is_new": false},{"rate": "7.7","cover_x": 1000,"title": "动物管理局","url": "https://movie.douban.com/subject/27107725/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2522426850.jpg","id": "27107725","cover_y": 1419,"is_new": false},{"rate": "9.5","cover_x": 1204,"title": "我们与恶的距离","url": "https://movie.douban.com/subject/30181230/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2554916825.jpg","id": "30181230","cover_y": 1720,"is_new": false},{"rate": "3.7","cover_x": 1071,"title": "带着爸爸去留学","url": "https://movie.douban.com/subject/30238247/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2540911433.jpg","id": "30238247","cover_y": 1500,"is_new": false},{"rate": "7.8","cover_x": 6732,"title": "大宋少年志","url": "https://movie.douban.com/subject/30170894/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2559266281.jpg","id": "30170894","cover_y": 11968,"is_new": false},{"rate": "9.2","cover_x": 600,"title": "大小谎言 第二季","url": "https://movie.douban.com/subject/27195401/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2558544696.jpg","id": "27195401","cover_y": 889,"is_new": false},{"rate": "8.8","cover_x": 750,"title": "请输入搜索词:WWW","url": "https://movie.douban.com/subject/30403333/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2556419121.jpg","id": "30403333","cover_y": 1062,"is_new": false},{"rate": "8.7","cover_x": 803,"title": "春夜","url": "https://movie.douban.com/subject/30428225/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2554598542.jpg","id": "30428225","cover_y": 1200,"is_new": false},{"rate": "9.4","cover_x": 1080,"title": "这!就是街舞 第二季","url": "https://movie.douban.com/subject/30486671/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2552553650.jpg","id": "30486671","cover_y": 1566,"is_new": false},{"rate": "8.8","cover_x": 1457,"title": "好兆头","url": "https://movie.douban.com/subject/26846856/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2558290974.jpg","id": "26846856","cover_y": 2159,"is_new": false},{"rate": "7.0","cover_x": 770,"title": "暗恋橘生淮南","url": "https://movie.douban.com/subject/26811775/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2559173820.jpg","id": "26811775","cover_y": 1080,"is_new": false},{"rate": "8.4","cover_x": 800,"title": "吹落的树叶","url": "https://movie.douban.com/subject/30438479/","playable": false,"cover": "https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2550315269.jpg","id": "30438479","cover_y": 1131,"is_new": false},{"rate": "8.6","cover_x": 945,"title": "白色强人","url": "https://movie.douban.com/subject/27195042/","playable": false,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2559049945.jpg","id": "27195042","cover_y": 1350,"is_new": false},{"rate": "7.1","cover_x": 1080,"title": "破冰行动","url": "https://movie.douban.com/subject/27052168/","playable": true,"cover": "https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2556232270.jpg","id": "27052168","cover_y": 1920,"is_new": false}],"tag": "热门"

使用requests爬取豆瓣电视相关推荐

  1. day02:requests爬取豆瓣电影信息

    一.requests爬取豆瓣电影信息 -请求url: https://movie.douban.com/top250 -请求方式: get -请求头: user-agent cookies二.sele ...

  2. requests爬取豆瓣前250部高分电影

    这两天又写了一个爬取豆瓣前250部高分电影的爬虫,并把电影名字和图片保存到本地. 用的是requests和BeautifulSoup. @requires_authorization import r ...

  3. 【爬虫案例】Requests爬取豆瓣短评以及入门爬虫注意事项

    一.Requests是什么? Requests 是一个 Python 的 HTTP 客户端库. 支持的 HTTP 特性: 保持活动和连接池.国际域名和URL.Cookie持久性会话 浏览器式SSL验证 ...

  4. python selenium爬虫豆瓣_使用selenium+requests爬取豆瓣小组讨论列表

    获取本文代码 · 我的GitHub 注:这个项目的代码会在我的GitHub持续优化.更新,而在本文中的代码则是最初版本的代码. 豆瓣小组 豆瓣有一个"小组"模块,有一些小组中会发布 ...

  5. 【Python 爬虫】(二)使用 Requests 爬取豆瓣短评

    文章目录 Requests库介绍 Requests库安装 Requests库的简单用法 实战 爬虫协议 Requests库介绍 Requests库官方的介绍有这么一句话:Requests,唯一的一个非 ...

  6. python爬取歌曲评论_python+requests爬取豆瓣歌曲评论

    1 #-*- coding: utf-8 -*- 2 ''' 3 Created on 2018年8月14日4 5 @author: zww6 7 ''' 8 importtime9 importre ...

  7. Requests:爬取豆瓣排名前250的电影名称

    import requests from bs4 import BeautifulSoup import pandas as pd # 读取豆瓣TOP250的电影名 def get_movies(): ...

  8. 使用requests爬取豆瓣电影top250

    今天使用requests获取豆瓣电影top250的内容,先说下思路 表设计 获取每一页的内容 获取当前页每一部电影对应的链接 ##获取所有页的url for i in range(10):top_ur ...

  9. python爬虫实践之爬取豆瓣高评分电影

    目录 概述 准备 所需模块 涉及知识点 运行效果 完成爬虫 1. 分析网页 2. 爬虫代码 3. 整理总结 概述 爬取豆瓣的高评分的电影. 准备 所需模块 re模块 requests模块 涉及知识点 ...

最新文章

  1. redis setnx 原子性_Redis从入门到深入-分布式锁(26)
  2. zabbix企业应用之报表功能
  3. CSS中z-index全解析
  4. 【Python学习系列十二】Python库pandas之CSV导入
  5. 深度强化学习落地指南:弥合DRL算法原理和落地实践之间的断层 | 文末送书
  6. Git代码同时push到GitHub和Gitee(码云)
  7. python全栈开发_day10_函数的实参和形参
  8. python 环境常用指令(updating...)
  9. c++ 实现录音并且指定到文件_搜狗发布四款AI录音笔,4大核心功能开启AI录音新时代...
  10. Qt——P12 信号连接信号
  11. 查找数组中被删除的一个元素
  12. 工程思想——关于串口通讯协议帧数据的一些想法
  13. 博客也是网络赚钱的有利工具
  14. JS 获取WEB请求路径
  15. Docker 持续推动创新,三款应用为您指引未来趋势
  16. Intel CMT CAT CDP 技术应用
  17. imx6ul pinctrl 驱动浅析
  18. 微信公众号在开发者模式下自定义菜单
  19. Python编程基础及应用
  20. 赵小楼《天道》《遥远的救世主》深度解析(51)为什么芮小丹用脱衣这个方式来告白丁元英?

热门文章

  1. 智能化安防视频监控行业发展是如何转变的?
  2. 氮化镓 服务器电源管理系统报价,基于LMG341x GaN FET的服务器电源单元(PSU)电路设计...
  3. linux双系统如何选择顺序,双系统中怎么设置Ubuntu多系统的默认启动顺序?
  4. 如何有效地学习知识,如何才能全面发展?
  5. Qt 动态实时显示波形图
  6. 新华三:助力IPv6部署,我们责无旁贷
  7. 网易七鱼“大闹”客服行业,真能一举定乾坤?
  8. 修改apt-get服务器,云服务器使用sudo apt-get update 失败的原因及解决方法(unubtu下)...
  9. JavaScript和html5 canvas生成圆形印章
  10. CentOS6.5下搭建LAMP+FreeRadius+Daloradius Web管理和TP-LINK路由器、H3C交换机连接,实现,上网认证和记账功能