python爬取电影网站信息并写入文件

 1 #https://www.domp4.com/list/6-1.html
 2 import requests
 3 import re
 4 from bs4 import BeautifulSoup
 5 from urllib.parse import urlparse,parse_qs
 6 import os
 7
 8
 9 def get_url_content(url): //获取网站的源码
10     response=requests.get(url)
11     if response.status_code==200:
12         return response.text
13     else:
14         return False
15
16 def parse_Web_Content(content):
17     Object=BeautifulSoup(content,'html.parser')
18
19     filmName=get_film_name(Object)
20     filmCast=get_film_cast(Object)
21     filmIntro=get_film_introduction(Object)
22     filmUrl=get_film_url(Object)
23
24     film=[]
25     for i in range(len(filmName)):
26         indiv={
27             'fileName':filmName[i],
28             'filmCast':filmCast[i],
29             'filmIntro':filmIntro[i],
30             'filmurl':'https://www.domp4.com'+filmUrl[i]
31         }
32         film.append(indiv)
33     return film
34
35
36 def get_film_name(Soup):
37     Name=Soup.select(".play_info")
38     name_list=[]
39     for i in range(len(Name)):
40         parsedName=Name[i].a.string
41         name_list.append(parsedName)
42     return name_list
43
44 def get_film_cast(Soup):
45     Cast=Soup.find_all('p',attrs={'class':'space'})
46     film_Cast = []
47     for i in range(len(Cast)):
48         parsedCast=Cast[i].text
49         film_Cast.append(parsedCast)
50     return film_Cast
51
52 def get_film_introduction(Soup):
53     Introduction=Soup.find_all('p',attrs={'class':'content'})
54     intro_list=[]
55     for i in range(len(Introduction)):
56         parsedIntro=Introduction[i].text
57         intro_list.append(parsedIntro)
58     return intro_list
59
60 def get_film_url(Soup):
61
62     filmUrl=Soup.select(".play_info")
63     Url_list=[]
64     for i in range(len(filmUrl)):
65         href=filmUrl[i].a['href']
66         Url_list.append(href)
67     return Url_list
68
69 def writeTofile(parsedWebcontent):
70     with open('film.txt','a',encoding='utf-8') as f:
71         for i in range(len(parsedWebcontent)):
72             f.write(parsedWebcontent[i]['fileName']+'\t')
73             f.write(parsedWebcontent[i]['filmCast'] + '\t')
74             f.write(parsedWebcontent[i]['filmIntro'] + '\t')
75             f.write(parsedWebcontent[i]['filmurl'] + '\t')
76             f.write('\n')
77         f.close()
78
79
80 link="https://www.domp4.com/list/6-"
81 for i in range(1,4):
82     url=link + str(i) + ".html"
83     webContent=get_url_content(url)
84
85     if webContent!=False:
86         Content=parse_Web_Content(webContent)
87         writeTofile(Content)

转载于:https://www.cnblogs.com/kevin162726/p/10765321.html

python爬取电影网站信息并写入文件相关推荐

Python爬取中药网站信息并对其进行简单的分析
开发工具 Python版本:3.5.4 相关模块: 爬虫: import requests from bs4 import BeautifulSoup 词云: from wordcloud impor ...
python爬取电影网站存储于数据库_Python零基础爬虫教程（实战案例爬取电影网站资源链接）...
前言好像没法添加链接,文中的链接只能复制到浏览器查看了这篇是我写在csdn的,那里代码格式支持更好,文章链接 https://blog.csdn.net/d497465762/article/de ...
python自动爬取更新电影网站_python爬取电影网站信息
一.爬取前提 1)本地安装了mysql数据库 5.6版本 2)安装了Python 2.7 二.爬取内容电影名称.电影简介.电影图片.电影下载链接三.爬取逻辑 1)进入电影网列表页, 针对列表的ht ...
python爬取网上租房信息_用python爬取租房网站信息的代码
自己在刚学习python时写的,中途遇到很多问题,查了很多资料,下面就是我爬取租房信息的代码: 链家的房租网站两个导入的包 1.requests 用来过去网页内容 2.BeautifulSoup i ...
python爬取电影网站存储于数据库_python爬虫猫眼电影和电影天堂数据csv和mysql存储过程解析...
字符串常用方法 # 去掉左右空格 'hello world'.strip() # 'hello world' # 按指定字符切割 'hello world'.split(' ') # ['hello' ...
python房子代码_用python爬取租房网站信息的代码
自己在刚学习python时写的,中途遇到很多问题,查了很多资料,下面就是我爬取租房信息的代码: 链家的房租网站两个导入的包 1.requests 用来过去网页内容 2.BeautifulSoup i ...
python电影系统管理-Python 爬取电影网站的信息【如有重复请管理删帖】
[Python] 纯文本查看复制代码#!/usr/bin/env python # -*- coding: utf-8 -*- # [url=home.php?mod=space&uid=6 ...
Python 爬取电影天堂top最新电影
Python爬虫有他无可比拟的优势:语法简单,经常几十行代码就能轻松解决问题,相比于JAVA,C,PHP;第三方库丰富,Python强大而又丰富的第三方库使他几乎可以无所不能.今天我们就来用用Pyth ...
python爬取小说章节信息用pygame进行数据显示_爬虫不过如此（python的Re 、Requests、BeautifulSoup 详细篇）...
网络爬虫(又被称为网页蜘蛛,网络机器人,在FOAF社区中间,更经常的称为网页追逐者),是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本. 爬虫的本质就是一段自动抓取互联网信息的程序,从网络获取 ...

python爬取电影网站信息并写入文件

python爬取电影网站信息并写入文件相关推荐

最新文章

热门文章