Python爬取豆瓣网中即将上映的电影数据清单
最近看了《终结者:黑暗命运》,之前也陆续看了几部电影,一时间觉得对电影着了迷,想要了解一下即将上映的电影有哪些,有没有自己感兴趣的,网上搜索了一下发现在豆瓣网里面就有专门的一块用来显示即将上映的电影清单,截图如下所示:
地址在这里。
顺便提一下,这里是2018年度电影榜单。
接下来就是具体的实现:
#!usr/bin/env python
#encoding:utf-8
from __future__ import division'''
__Author__:沂水寒城
功能: Python爬取豆瓣网中即将上映的电影数据清单
'''import sys
import urllib
from lxml import etree
import lxml.html as HTML if sys.version_info==2:reload(sys)sys.setdefaultencoding("utf-8")def doubanComingMovieSpider():'''获取天气数据(无实时天气数据)'''res_list=[]page_html=urllib.urlopen('https://movie.douban.com/coming').read()print 'html_length: ', len(page_html)hdoc=etree.HTML(page_html)htree=etree.ElementTree(hdoc)for i in range(1,80):try:times=htree.xpath('//*[@id="content"]/div/div[1]/table/tbody/tr['+str(i)+']/td[1]/text()')name=htree.xpath('//*[@id="content"]/div/div[1]/table/tbody/tr['+str(i)+']/td[2]/a/text()')types=htree.xpath('//*[@id="content"]/div/div[1]/table/tbody/tr['+str(i)+']/td[3]/text()')zone=htree.xpath('//*[@id="content"]/div/div[1]/table/tbody/tr['+str(i)+']/td[4]/text()')number=htree.xpath('//*[@id="content"]/div/div[1]/table/tbody/tr['+str(i)+']/td[5]/text()')print 'times: ',timesprint 'name: ',nameprint 'types: ',typesprint 'zone: ',zoneprint 'number: ',numberexcept:passif __name__ == '__main__':doubanComingMovieSpider()
实现非常地简单,没有很复杂的地方,结果输出如下:
html_length: 55682
times: [u'\n 11\u670807\u65e5\n\n \n ']
name: [u'\u8d8a\u57df\u91cd\u751f']
types: [u'\n \u52a8\u4f5c / \u72af\u7f6a / \u60ca\u609a\n ']
zone: [u'\n \u7f8e\u56fd\n ']
number: [u'\n 565\u4eba\n ']
times: [u'\n 11\u670807\u65e5\n\n \n ']
name: [u'\u90a3\u5ea7\u6865']
types: [u'\n \u5267\u60c5 / \u5bb6\u5ead\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 123\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u51b3\u6218\u4e2d\u9014\u5c9b']
types: [u'\n \u5267\u60c5 / \u5386\u53f2 / \u6218\u4e89\n ']
zone: [u'\n \u7f8e\u56fd\n ']
number: [u'\n 19179\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u5ba0\u7269\u8054\u76df']
types: [u'\n \u559c\u5267 / \u52a8\u753b\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646 / \u5fb7\u56fd\n ']
number: [u'\n 13110\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u53d7\u76ca\u4eba']
types: [u'\n \u5267\u60c5 / \u559c\u5267 / \u7231\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 10459\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u6211\u7684\u62f3\u738b\u7537\u53cb']
types: [u'\n \u7231\u60c5 / \u8fd0\u52a8\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 1517\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u6b66\u6797\u5b64\u513f']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 1282\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u8ba2\u4eb2']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 797\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u9ec4\u82b1\u5858\u5f80\u4e8b']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 563\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u5269\u5973\u89c5\u7231\u8bb0']
types: [u'\n \u5267\u60c5 / \u7231\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 528\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u4e00\u4e2a\u4eba\u7684\u57ce\u5e02']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 85\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u7231\u60c5\u56fe\u9274\u4e4b\u6697\u604b']
types: [u'\n \u7231\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 57\u4eba\n ']
times: [u'\n 11\u670808\u65e5\n\n \n ']
name: [u'\u5c0f\u5fc3\u201c\u9677\u9631\u201d']
types: [u'\n \u559c\u5267\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 35\u4eba\n ']
times: [u'\n 11\u670809\u65e5\n\n \n ']
name: [u'\u81f4\u656c\u82f1\u96c4']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 146\u4eba\n ']
times: [u'\n 11\u670810\u65e5\n\n \n ']
name: [u'\u642d\u79cb\u5343\u7684\u4eba']
types: [u'\n \u5267\u60c5 / \u5bb6\u5ead\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 222\u4eba\n ']
times: [u'\n 11\u670811\u65e5\n\n \n ']
name: [u'\u4ed6\u4eec\u5df2\u4e0d\u518d\u53d8\u8001']
types: [u'\n \u5386\u53f2 / \u6218\u4e89 / \u7eaa\u5f55\u7247\n ']
zone: [u'\n \u82f1\u56fd / \u65b0\u897f\u5170\n ']
number: [u'\n 34460\u4eba\n ']
times: [u'\n 11\u670812\u65e5\n\n \n ']
name: [u'\u5c0f\u8f7f\u8f66']
types: [u'\n \u5267\u60c5 / \u513f\u7ae5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 49\u4eba\n ']
times: [u'\n 11\u670815\u65e5\n\n \n ']
name: [u'\u6d77\u4e0a\u94a2\u7434\u5e08']
types: [u'\n \u5267\u60c5 / \u97f3\u4e50\n ']
zone: [u'\n \u610f\u5927\u5229\n ']
number: [u'\n 276733\u4eba\n ']
times: [u'\n 11\u670815\u65e5\n\n \n ']
name: [u'\u76d7\u68a6\u7279\u653b\u961f']
types: [u'\n \u72af\u7f6a / \u52a8\u753b\n ']
zone: [u'\n \u5308\u7259\u5229\n ']
number: [u'\n 70562\u4eba\n ']
times: [u'\n 11\u670815\u65e5\n\n \n ']
name: [u'\u9ea6\u5b50\u7684\u76d6\u5934']
types: [u'\n \u5267\u60c5 / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 8836\u4eba\n ']
times: [u'\n 11\u670815\u65e5\n\n \n ']
name: [u'\u9739\u96f3\u5a07\u5a03']
types: [u'\n \u52a8\u4f5c / \u5192\u9669\n ']
zone: [u'\n \u7f8e\u56fd\n ']
number: [u'\n 8813\u4eba\n ']
times: [u'\n 11\u670815\u65e5\n\n \n ']
name: [u'\u957f\u5b89\u9053']
types: [u'\n \u5267\u60c5 / \u72af\u7f6a / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 8217\u4eba\n ']
times: [u'\n 11\u670815\u65e5\n\n \n ']
name: [u'\u5927\u7ea6\u5728\u51ac\u5b63']
types: [u'\n \u5267\u60c5 / \u7231\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 4809\u4eba\n ']
times: [u'\n 11\u670815\u65e5\n\n \n ']
name: [u'\u840c\u5ba0\u7279\u5de5\u961f']
types: [u'\n \u52a8\u753b / \u5192\u9669 / \u5bb6\u5ead\n ']
zone: [u'\n \u5fb7\u56fd / \u6bd4\u5229\u65f6\n ']
number: [u'\n 1696\u4eba\n ']
times: [u'\n 11\u670815\u65e5\n\n \n ']
name: [u'\u90a3\u4e00\u591c\uff0c\u6211\u7ed9\u4f60\u5f00\u8fc7\u8f66']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 827\u4eba\n ']
times: [u'\n 11\u670815\u65e5\n\n \n ']
name: [u'\u7236\u5b50\u62f3\u738b']
types: [u'\n \u8fd0\u52a8 / \u5bb6\u5ead\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 220\u4eba\n ']
times: [u'\n 11\u670815\u65e5\n\n \n ']
name: [u'\u64bc\u5c71\u7476']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 82\u4eba\n ']
times: [u'\n 11\u670816\u65e5\n\n \n ']
name: [u'\u4e00\u8f66\u56db\u4ec6']
types: [u'\n \u559c\u5267\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 71\u4eba\n ']
times: [u'\n 11\u670818\u65e5\n\n \n ']
name: [u'\u6211\u5728\u539f\u5730\u7b49\u4f60']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 163\u4eba\n ']
times: [u'\n 11\u670819\u65e5\n\n \n ']
name: [u'\u706b\u7ea2\u9752\u6625']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 39\u4eba\n ']
times: [u'\n 11\u670822\u65e5\n\n \n ']
name: [u'\u51b0\u96ea\u5947\u7f182']
types: [u'\n \u559c\u5267 / \u52a8\u753b / \u5192\u9669\n ']
zone: [u'\n \u7f8e\u56fd\n ']
number: [u'\n 39712\u4eba\n ']
times: [u'\n 11\u670822\u65e5\n\n \n ']
name: [u'\u522b\u544a\u8bc9\u5979']
types: [u'\n \u5267\u60c5 / \u559c\u5267\n ']
zone: [u'\n \u7f8e\u56fd\n ']
number: [u'\n 28430\u4eba\n ']
times: [u'\n 11\u670822\u65e5\n\n \n ']
name: [u'\u4f60\u662f\u51f6\u624b']
types: [u'\n \u5267\u60c5 / \u72af\u7f6a / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 4883\u4eba\n ']
times: [u'\n 11\u670822\u65e5\n\n \n ']
name: [u'\u96f7\u7c73\u5947\u9047\u8bb0']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u6cd5\u56fd\n ']
number: [u'\n 2990\u4eba\n ']
times: [u'\n 11\u670822\u65e5\n\n \n ']
name: [u'\u8ffd\u51f6\u5341\u4e5d\u5e74']
types: [u'\n \u5267\u60c5 / \u72af\u7f6a / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 2971\u4eba\n ']
times: [u'\n 11\u670822\u65e5\n\n \n ']
name: [u'\u5f81\u9014']
types: [u'\n \u52a8\u4f5c / \u5192\u9669\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 2262\u4eba\n ']
times: [u'\n 11\u670822\u65e5\n\n \n ']
name: [u'\u72ac\u7231']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 210\u4eba\n ']
times: [u'\n 11\u670822\u65e5\n\n \n ']
name: [u'\u5a31\u4e50\u8ffd\u51fb']
types: [u'\n \u559c\u5267 / \u52a8\u4f5c\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 54\u4eba\n ']
times: [u'\n 11\u670822\u65e5\n\n \n ']
name: [u'\u9aa8\u74f7']
types: [u'\n \u6050\u6016\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 25\u4eba\n ']
times: [u'\n 11\u670822\u65e5\n\n \n ']
name: [u'\u7231\xb7\u4e4b\u75d5']
types: [u'\n \u5267\u60c5 / \u7231\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 24\u4eba\n ']
times: [u'\n 11\u670828\u65e5\n\n \n ']
name: [u'\u5f52\u53bb']
types: [u'\n \u5267\u60c5 / \u5bb6\u5ead\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 413\u4eba\n ']
times: [u'\n 11\u670829\u65e5\n\n \n ']
name: [u'\u4e24\u53ea\u8001\u864e']
types: [u'\n \u5267\u60c5 / \u559c\u5267\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 11856\u4eba\n ']
times: [u'\n 11\u670829\u65e5\n\n \n ']
name: [u'\u5e73\u539f\u4e0a\u7684\u590f\u6d1b\u514b']
types: [u'\n \u5267\u60c5 / \u559c\u5267 / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 7393\u4eba\n ']
times: [u'\n 11\u670829\u65e5\n\n \n ']
name: [u'\u8863\u67dc\u91cc\u7684\u5192\u9669\u738b']
types: [u'\n \u5267\u60c5 / \u559c\u5267\n ']
zone: [u'\n \u6cd5\u56fd / \u5370\u5ea6\n ']
number: [u'\n 4128\u4eba\n ']
times: [u'\n 11\u670829\u65e5\n\n \n ']
name: [u'\u4e00\u751f\u6709\u4f60']
types: [u'\n \u7231\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 2247\u4eba\n ']
times: [u'\n 11\u670829\u65e5\n\n \n ']
name: [u'\u51b0\u5cf0\u66b4']
types: [u'\n \u52a8\u4f5c / \u707e\u96be\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 1262\u4eba\n ']
times: [u'\n 12\u670806\u65e5\n\n \n ']
name: [u'\u5357\u65b9\u8f66\u7ad9\u7684\u805a\u4f1a']
types: [u'\n \u5267\u60c5 / \u72af\u7f6a\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646 / \u6cd5\u56fd\n ']
number: [u'\n 88168\u4eba\n ']
times: [u'\n 12\u670806\u65e5\n\n \n ']
name: [u'\u5439\u54e8\u4eba']
types: [u'\n \u5267\u60c5 / \u72af\u7f6a / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 16836\u4eba\n ']
times: [u'\n 12\u670806\u65e5\n\n \n ']
name: [u'\u7f51\u7edc\u51f6\u94c3']
types: [u'\n \u60ca\u609a / \u6050\u6016\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 106\u4eba\n ']
times: [u'\n 12\u670806\u65e5\n\n \n ']
name: [u'\u8ff7\u5c40\u4f0f\u9999']
types: [u'\n \u5267\u60c5 / \u72af\u7f6a / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646 / \u4e2d\u56fd\u6fb3\u95e8\n ']
number: [u'\n 28\u4eba\n ']
times: [u'\n 12\u670807\u65e5\n\n \n ']
name: [u'\u5170\u5fc3\u5927\u5267\u9662']
types: [u'\n \u5267\u60c5 / \u52a8\u4f5c\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 65738\u4eba\n ']
times: [u'\n 12\u670807\u65e5\n\n \n ']
name: [u'\u9f99\u4e4b\u8c37\uff1a\u7834\u6653\u5947\u5175']
types: [u'\n \u52a8\u753b / \u5947\u5e7b / \u5192\u9669\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 7690\u4eba\n ']
times: [u'\n 12\u670812\u65e5\n\n \n ']
name: [u'\u5929\u706b']
types: [u'\n \u52a8\u4f5c / \u5192\u9669\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 578\u4eba\n ']
times: [u'\n 12\u670813\u65e5\n\n \n ']
name: [u'\u88ab\u5149\u6293\u8d70\u7684\u4eba']
types: [u'\n \u5267\u60c5 / \u7231\u60c5 / \u79d1\u5e7b\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 6113\u4eba\n ']
times: [u'\n 12\u670820\u65e5\n\n \n ']
name: [u'\u53f6\u95ee4']
types: [u'\n \u4f20\u8bb0 / \u52a8\u4f5c / \u5386\u53f2\n ']
zone: [u'\n \u4e2d\u56fd\u9999\u6e2f\n ']
number: [u'\n 8283\u4eba\n ']
times: [u'\n 12\u670820\u65e5\n\n \n ']
name: [u'\u53ea\u6709\u82b8\u77e5\u9053']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 4500\u4eba\n ']
times: [u'\n 12\u670820\u65e5\n\n \n ']
name: [u'\u8bef\u6740']
types: [u'\n \u5267\u60c5 / \u72af\u7f6a\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 1574\u4eba\n ']
times: [u'\n 12\u670820\u65e5\n\n \n ']
name: [u'\u767d\u65e5\u8ff7\u96fe']
types: [u'\n \u72af\u7f6a / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 456\u4eba\n ']
times: [u'\n 12\u670824\u65e5\n\n \n ']
name: [u'\u8fc7\u6e21\u7a7a\u95f4']
types: [u'\n \u52a8\u4f5c / \u79d1\u5e7b / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 952\u4eba\n ']
times: [u'\n 12\u670828\u65e5\n\n \n ']
name: [u'\u7ad9\u4f4f\uff01\u5c0f\u5077']
types: [u'\n \u559c\u5267\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 20521\u4eba\n ']
times: [u'\n 12\u670828\u65e5\n\n \n ']
name: [u'\u4e2d\u534e\u718a\u732b']
types: [u'\n \u52a8\u753b\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 146\u4eba\n ']
times: [u'\n 12\u670831\u65e5\n\n \n ']
name: [u'\u5ba0\u7231']
types: [u'\n \u5267\u60c5 / \u559c\u5267 / \u7231\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 4201\u4eba\n ']
times: [u'\n 12\u6708\n\n \n ']
name: [u'\u6625\u6c5f\u6c34\u6696']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 8180\u4eba\n ']
times: [u'\n 2020\u5e7401\u670801\u65e5\n\n \n ']
name: [u'\u8d1d\u80af\u718a2\uff1a\u91d1\u724c\u7279\u5de5']
types: [u'\n \u52a8\u753b\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 186\u4eba\n ']
times: [u'\n 2020\u5e7401\u670801\u65e5\n\n \n ']
name: [u'\u963f\u91cc\u5df4\u5df4\u4e0e\u795e\u706f']
types: [u'\n \u52a8\u753b / \u5192\u9669\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 93\u4eba\n ']
times: [u'\n 2020\u5e7401\u670811\u65e5\n\n \n ']
name: [u'\u5c71\u6d77\u7ecf\u4e4b\u5c0f\u4eba\u56fd']
types: [u'\n \u513f\u7ae5 / \u52a8\u753b / \u5192\u9669\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646 / \u7f8e\u56fd\n ']
number: [u'\n 264\u4eba\n ']
times: [u'\n 2020\u5e7401\u670825\u65e5\n\n \n ']
name: [u'\u5510\u4eba\u8857\u63a2\u68483']
types: [u'\n \u559c\u5267 / \u52a8\u4f5c / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 81628\u4eba\n ']
times: [u'\n 2020\u5e7401\u670825\u65e5\n\n \n ']
name: [u'\u59dc\u5b50\u7259']
types: [u'\n \u52a8\u753b\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 31699\u4eba\n ']
times: [u'\n 2020\u5e7401\u670825\u65e5\n\n \n ']
name: [u'\u4e2d\u56fd\u5973\u6392']
types: [u'\n \u5267\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646 / \u4e2d\u56fd\u9999\u6e2f\n ']
number: [u'\n 24300\u4eba\n ']
times: [u'\n 2020\u5e7401\u670825\u65e5\n\n \n ']
name: [u'\u7d27\u6025\u6551\u63f4']
types: [u'\n \u5267\u60c5 / \u52a8\u4f5c / \u707e\u96be\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 20230\u4eba\n ']
times: [u'\n 2020\u5e7401\u670825\u65e5\n\n \n ']
name: [u'\u56e7\u5988']
types: [u'\n \u5267\u60c5 / \u559c\u5267\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 18789\u4eba\n ']
times: [u'\n 2020\u5e7401\u670825\u65e5\n\n \n ']
name: [u'\u6025\u5148\u950b']
types: [u'\n \u52a8\u4f5c / \u5192\u9669\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 4625\u4eba\n ']
times: [u'\n 2020\u5e7401\u670825\u65e5\n\n \n ']
name: [u'\u91d1\u7985\u964d\u9b54']
types: [u'\n \u52a8\u4f5c / \u5947\u5e7b / \u5192\u9669\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 749\u4eba\n ']
times: [u'\n 2020\u5e7401\u670825\u65e5\n\n \n ']
name: [u'\u5927\u7ea2\u5305']
types: [u'\n \u559c\u5267 / \u7231\u60c5\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 511\u4eba\n ']
times: [u'\n 2020\u5e7401\u670825\u65e5\n\n \n ']
name: [u'\u5446\u74dc\u5144\u5f1f\u4e4b\u5feb\u4e50\u51ac\u5929']
types: [u'\n \u559c\u5267 / \u52a8\u753b\n ']
zone: [u'\n \u6377\u514b / \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 262\u4eba\n ']
times: [u'\n 2020\u5e7406\u670821\u65e5\n\n \n ']
name: [u'\u516d\u6708\u7684\u79d8\u5bc6']
types: [u'\n \u5267\u60c5 / \u60ac\u7591 / \u97f3\u4e50\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646 / \u7f8e\u56fd\n ']
number: [u'\n 1703\u4eba\n ']
times: [u'\n 2020\u5e7410\u670801\u65e5\n\n \n ']
name: [u'\u9ed1\u8272\u5047\u9762']
types: [u'\n \u5267\u60c5 / \u60ac\u7591\n ']
zone: [u'\n \u4e2d\u56fd\u5927\u9646\n ']
number: [u'\n 5038\u4eba\n ']
[Finished in 1.5s]
需要的话可以拿去使用,数据我是直接打印出来了没有做清洗处理,可以根据自己需要进行处理分析。
Python爬取豆瓣网中即将上映的电影数据清单相关推荐
- Python爬虫实战(1) | 爬取豆瓣网排名前250的电影(下)
在Python爬虫实战(1) | 爬取豆瓣网排名前250的电影(上)中,我们最后爬出来的结果不是很完美,这对于"精益求精.追求完美的"程序猿来说怎么能够甘心 所以,今天,用pyth ...
- Python爬取豆瓣网影评展示
Python爬取豆瓣网影评展示 需要的库文件 requests beautifulsoup wordcloud jieba matplotlib 本文思想 1.访问指定的网页 #获取指定url的内容 ...
- Python爬虫实战(1) | 爬取豆瓣网排名前250的电影(上)
今天我们来爬取一下豆瓣网上排名前250的电影. 需求:爬取豆瓣网上排名前250的电影,然后将结果保存至一个记事本里. 开发环境: python3.9 pycharm2021专业版 我们先观察网页,看看 ...
- python爬取东方财富网中的资金流向表
因为东方财富网中的资金流向表是一个动态的数据,所以采用selenium模块进行爬取. 爬取东方财富网的资金流向表的具体步骤: 1.获取初始的URL 2.爬取对应的URL地址的网页,获取新的URL地址 ...
- Python爬取返利网(今日值得买)数据
双十一还没消停,双十二又来了.看返利网<今日值得买>的数据时时不断的在更新...... 1.爬取返利网的商品名,分类,推荐人,好评数和差评数 2.商品信息不断更新,查看页面源代码仅可以看见 ...
- python爬取豆瓣网即将上映的电影,数据信息存储到json文件
1,import库的安装,在我其它博文中有:获取豆瓣网即将上映的网页信息即HTML页面. 2,解析获取到的网页的数据信息 3将获取到的数据信息,放到json文件 4,主程序
- python爬取豆瓣网评并写入excel表格中
为了爬取网评我们需要导入几个模块 from selenium import webdriver import time import xlwt 先定义要爬取的网站url'以及设置浏览器参数 movie ...
- python爬取豆瓣网资源DIY影讯
输出结果: 名字:哆啦A梦:伴我同行2,链接:https://movie.douban.com/subject/34913671/,日期:05月28日,类型:剧情 / 动画,地区:日本, 关注者:17 ...
- 爬去豆瓣网中电影信息并保存到本地目录当中
爬取豆瓣网中电影信息并保存到本地目录当中 读者可以根据源代码来设计自己的爬虫,url链接不能通用,由于源代码中后续查找筛选中有不同类或者标签名,仅供参考,另外推荐b站上一个老师,叫路飞学城IT的,讲的 ...
- Python爬取天气网历史天气数据
我的第一篇博客,哈哈哈,记录一下我的Python进阶之路! 今天写了一个简单的爬虫. 使用Python的requests 和BeautifulSoup模块,Python 2.7.12可在命令行中直接使 ...
最新文章
- composer update 的时候提示the requested PHP extension pcntl is missing from your system.的方法处理
- 分布式实时计算—实时数据质量如何保障?
- window下jansson安装和使用
- 【最短路】【图论】【Floyed】牛的旅行(ssl 1119/luogu 1522)
- python之pyqt5-第一个pyqt5程序-图像压缩工具(2.0版本)-小记
- 支付宝老年大学招95后青年讲师:不要大厂经验高学历,只要会跳广场舞会钓鱼?...
- linux怎么使用git安装目录,Linux系统中怎么安装Git?
- WPF中如何创建服务
- SPOJ 9939 Eliminate the Conflict
- 关闭窗口(window.close)
- 我有你没有游戏例子100_50米的决赛圈里面藏着100个人?光子:知道什么叫质量局了吧!...
- JavaScript运算符运算优先级
- Redis配置文件redis.conf配置详解
- 起心动念成大愿,点亮心灯祝世界 “点亮心灯祝福世界”活动圆满收官
- Pycharm: ImportError: attempted relative import with no known parent package解决方案
- Python编程笔记(第三篇)【补充】三元运算、文件处理、检测文件编码、递归、斐波那契数列、名称空间、作用域、生成器...
- The Thirteenth Of Word-Day
- 网页不显示验证码的原因与处理方法
- ldpc译码讲解_LDPC编译码基本原理
- Important Programming Concepts (Even on Embedded Systems) Part V: State Machines