功能描述V1.0：

爬取豆瓣电影排行top250

功能分析：

使用的库

1、time

2、json

3、requests

4、BuautifulSoup

5、RequestException

上机实验室：

"""作者：李舵日期：2019-4-27功能：抓取豆瓣电影top250版本：V1.0
"""import time
import json
import requests
from bs4 import BeautifulSoup
from requests.exceptions import RequestExceptiondef get_one_page(url):try:headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36'}response = requests.get(url, headers=headers)if response.status_code == 200:return response.textreturn Noneexcept RequestException:return Nonedef parse_one_page(html):soup = BeautifulSoup(html, 'lxml')ol_list = soup.find('ol', {'class': 'grid_view'})li_list = ol_list.find_all('li')for i in range(25):move_value = li_list[i]yield {'index': move_value.find('em', {'class': ''}).text.strip(),'title': move_value.find('span', {'class': 'title'}).text.strip(),'actor': move_value.find('p', {'class': ''}).text.strip(),'score': move_value.find('span', {'class': 'rating_num'}).text.strip()}def write_to_file(content):with open('result.txt', 'a', encoding='utf-8') as f:print(type(json.dumps(content)))f.write(json.dumps(content, ensure_ascii=False)+'\n')def main(start):url = 'https://movie.douban.com/top250?start=' + str(start)html = get_one_page(url)for item in parse_one_page(html):print(item)write_to_file(item)if __name__ == '__main__':for i in range(0,250,25):main(start=i)time.sleep(1)

功能描述V2.0：

爬取豆瓣电影排行top250

功能分析：

使用的库

1、time

2、requests

3、RequestException

上机实验室：

"""
作者：李舵
日期：2019 - 4 - 8
功能：抓取豆瓣电影top250
版本：V2.0
"""import re
import time
import requests
from requests.exceptions import RequestExceptiondef get_one_page(url):try:headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}response = requests.get(url, headers=headers)if response.status_code == 200:return response.textreturn Noneexcept RequestException:return Nonedef parse_one_page(html):pattern = re.compile(u'<div.*?class="item">.*?'+ u'<div.*?class="pic">.*?'+ u'<em.*?class="">(.*?)</em>.*?'+ u'<div.*?class="info">.*?'+ u'<span.*?class="title">(.*?)</span>.*?'+ u'<span.*?class="other">(.*?)</span>.*?'+ u'<div.*?class="bd">.*?'+ u'<p.*?class="">.*?'+ u'导演:\s(.*?)\s.*?<br>'+ u'(.*?) / '+ u'(.*?) / (.*?)</p>.*?'+ u'<div.*?class="star">.*?'+ u'<span.*?class="rating_num".*?property="v:average">'+ u'(.*?)</span>.*?'+ u'<span>(.*?)人评价</span>.*?'+ u'<span.*?class="inq">(.*?)</span>', re.S)movies = re.findall(pattern, html)movie_list = []for movie in movies:movie_list.append([movie[0],movie[1],movie[2].lstrip(' / '),movie[3],movie[4].lstrip(),movie[5],movie[6].strip(),movie[7],movie[8],movie[9]])return movie_listdef write_to_file(movie_list):with open('top_250.txt', 'w', encoding='utf-8',) as f:for movie in movie_list:f.write('电影排名：' + movie[0] + '\n')f.write('电影名称：' + movie[1] + '\n')f.write('电影别名：' + movie[2] + '\n')f.write('导演：' + movie[3] + '\n')f.write('上映年份：' + movie[4] + '\n')f.write('制作国家/地区：' + movie[5] + '\n')f.write('电影类别：' + movie[6] + '\n')f.write('评分：' + movie[7] + '\n')f.write('参评人数：' + movie[8] + '\n')f.write('简短影评：' + movie[9] + '\n')f.write('\n')print('成功写入文件，共有%d条记录……' % len(movie_list))f.close()def main(start):url = 'https://movie.douban.com/top250?start=' + str(start)html = get_one_page(url)movie_list = parse_one_page(html)write_to_file(movie_list)if __name__ == '__main__':for i in range(0, 250, 25):main(start=i)time.sleep(1)

补充说明：

1、

转载于:https://www.cnblogs.com/liduo0413/p/10779802.html

爬取豆瓣电影排行top250相关推荐

python3[爬虫基础入门实战] 爬取豆瓣电影排行top250
先来张爬取结果的截图再来份代码吧 # encoding=utf8 import requests import re from bs4 import BeautifulSoup from tkint ...
爬取豆瓣电影排行榜top250
爬取豆瓣电影top250 平时不知道看什么电影,正好最近学习了爬虫,自己试着把电影排行下载下来,边看边学两不误. 下面直接上代码: import requests from bs4 import Be ...
Scrapy框架学习 - 爬取豆瓣电影排行榜TOP250所有电影信息并保存到MongoDB数据库中
概述利用Scrapy爬取豆瓣电影Top250排行榜电影信息,并保存到MongoDB数据库中使用pymongo库操作MOngodb数据库没有进行数据清洗源码 items.py class Dou ...
第一次练手爬取豆瓣电影排名top250
从b站看了如何爬取豆瓣电影top250的视频,就是跟着练习实现了,就把代码贴了一下. from bs4 import BeautifulSoup #网页解析,获取数据 import re #正则表达式 ...
Python爬虫新手入门教学（一）：爬取豆瓣电影排行信息
前言本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理. Python爬虫.数据分析.网站开发等案例教程视频免费在线观看 https://space. ...
《进击的虫师》爬取豆瓣电影海报(Top250)
有人想学一点编程, 但是一直没有找到感兴趣的切入点,可以简单的爬虫入手! 几十行代码, 轻松爬取豆瓣Top250电影数据,即刻体会编程的乐趣... 工具介绍: Python3(python是很容易上手 ...
爬虫爬取豆瓣电影排行，保存到excel和SQLite
# -*- coding = utf-8 -*- # @Time : 2021/11/18 19:54 # @Author : Vvfrom bs4 import BeautifulSoup # 网页 ...
Python爬取豆瓣电影的Top250（链接、电影名、评分和相关描述等属性）
用了三天的时间学习了简单的爬虫爬取网站数据的过程,循序渐进但也充满趣味,涉及的知识点也很多,尤其是伪装成浏览器.正则表达式.解析网页内容.爬取的数据存档数据库等内容,这是笔者使用python跟做的第一 ...
Python网络爬虫：利用正则表达式爬取豆瓣电影top250排行前10页电影信息
在学习了几个常用的爬取包方法后,转入爬取实战. 爬取豆瓣电影早已是练习爬取的常用方式了,网上各种代码也已经很多了,我可能现在还在做这个都太土了,不过没事,毕竟我也才刚入门-- 这次我还是利用正则表达式 ...

爬取豆瓣电影排行top250

功能描述V1.0：

功能分析：

上机实验室：

功能描述V2.0：

功能分析：

上机实验室：

补充说明：

爬取豆瓣电影排行top250相关推荐

最新文章

热门文章