python爬虫下载影视网站的电影

  1. 我这边选取了vip网站
  2. F12打开调试抓包模式
  3. 搜索影片的名称, 观察看看给那个url地址发送了请求,
    我看到了 这个请求需要携带发送数据 而这个数据就是我们要的影片名字
    这样的就话就可以构建第一段代码
class Video():"""获取电影信息"""def __init__(self):self.query = input("请输入你要下的电影或者电视:")# query = "闪电侠"self.form_data = {"wd": self.query}self.url = "http://m.6080w.com/index.php?m=vod-search"self.headers = {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding": "gzip, deflate","Accept-Language": "zh-CN,zh;q=0.9","Cache-Control": "max-age=0","Connection": "keep-alive","Content-Length": "30","Content-Type": "application/x-www-form-urlencoded","Cookie": "__cfduid=d9734b9dde8ba7cb1e88c1ea50773d3081604822345; UM_distinctid=175a6dda4fc2f-0593ff933e9314-303464-15f900-175a6dda4fdaac; CFWztgVisitTotal_643_Cookie=8; cck_lasttime=1604903088259; cck_count=2; PHPSESSID=33ctoent9p7gmees4v57evahqr; Hm_lvt_1be9687a89f08192cf85d11531ef7b32=1604886953,1604887736,1604907659,1604908160; CNZZDATA1276411901=1025499974-1604819080-null%7C1604932558; mac_history=%7Bvideo%3A%5B%7B%22name%22%3A%22%u7784%u51C6%22%2C%22link%22%3A%22/Movie/234395.html%22%2C%22typename%22%3A%22%u79D1%u5E7B%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-07-24/159558122926.jpg%22%7D%2C%7B%22name%22%3A%22%u6CA1%u5927%u6CA1%u5C0F%22%2C%22link%22%3A%22/Movie/9415.html%22%2C%22typename%22%3A%22%u559C%u5267%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22tu.php%3Ftu%3Dr1.ykimg.com/05160000513555459792737957002CAE%22%7D%2C%7B%22name%22%3A%22%u884C%u5C38%u4E4B%u60E7%u7B2C%u516D%u5B63%22%2C%22link%22%3A%22/Movie/241403.html%22%2C%22typename%22%3A%22%u6B27%u7F8E%u5267%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-10-12/160243920319.jpg%22%7D%2C%7B%22name%22%3A%22%u6B66%u795E%u4E3B%u5BB0%22%2C%22link%22%3A%22/Movie/220270.html%22%2C%22typename%22%3A%22%u52A8%u6F2B%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-03-08/15836400263.jpg%22%7D%2C%7B%22name%22%3A%22%u7535%u51FB%u5C0F%u5B503%u4E4B%u56DB%u5723%u56E2%22%2C%22link%22%3A%22/Movie/199684.html%22%2C%22typename%22%3A%22%u52A8%u6F2B%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2019-08-03/156477877615.jpg%22%7D%2C%7B%22name%22%3A%22%u9B3C%u5439%u706F%u4E4B%u9F99%u5CAD%u8FF7%u7A9F%22%2C%22link%22%3A%22/Movie/222460.html%22%2C%22typename%22%3A%22%u5267%u60C5%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-04-03/158590081711.jpg%22%7D%2C%7B%22name%22%3A%22%u91CD%u88C5%u673A%u7532%22%2C%22link%22%3A%22/Movie/243395.html%22%2C%22typename%22%3A%22%u52A8%u4F5C%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-11-08/16048116123.jpg%22%7D%2C%7B%22name%22%3A%22%u751F%u751F%u4E16%u4E16%22%2C%22link%22%3A%22/Movie/228446.html%22%2C%22typename%22%3A%22%u6E2F%u53F0%u5267%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-06-11/159185161516.jpg%22%7D%2C%7B%22name%22%3A%22%u4E24%u5929%u4E00%u591C%u7B2C%u56DB%u5B63%22%2C%22link%22%3A%22/Movie/213564.html%22%2C%22typename%22%3A%22%u7EFC%u827A%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2019-12-10/157590726511.jpg%22%7D%2C%7B%22name%22%3A%22%u641C%u7D22%5B2020%5D%22%2C%22link%22%3A%22/Movie/241908.html%22%2C%22typename%22%3A%22%u65E5%u97E9%u5267%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-10-18/160299720137.jpg%22%7D%2C%7B%22name%22%3A%22%u9752%u7C2A%u9501%u4E09%u5343%22%2C%22link%22%3A%22/Movie/243447.html%22%2C%22typename%22%3A%22%u5267%u60C5%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-11-09/16048836021.jpg%22%7D%2C%7B%22name%22%3A%22%u6211%u5BB6%u7684%u718A%u5B69%u5B50%22%2C%22link%22%3A%22/Movie/91863.html%22%2C%22typename%22%3A%22%u7EFC%u827A%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2019-06-22/156119406926.jpg%22%7D%5D%7D; cf_clearance=56e9fbefec1664c5ebc36ffdb29e75cff9845d67-1604937518-0-1zc9693d2bz50e9279eza2074e5-150; Hm_lpvt_1be9687a89f08192cf85d11531ef7b32=1604937549","Host": "m.6080w.com","Origin": "http://m.6080w.com","Referer": "http://m.6080w.com/","Upgrade-Insecure-Requests": "1","User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36"}
response = requests.post(url=url, data=self.form_data ,headers=headers)
return response.content.decode()

思路: 获取所需视频的信息> 获取所需视频的视频页数 > 获取所需视频的视频地址

  1. 获取简介里面的ur地址 再手动添加http加域名

    5.接下来利用xpath来提取url

    5.接下来到了视频的主界面了, 视频的加载均为动态的视频地址 每一个地址都是一小段视频

    分析 找到了结尾的文件名

    发现是一个文件存着所有信息

下面就可以构建url进行所有片段视频的下载, 再用视频拼接软件进行片段视频代码的拼接。

下面直接上源码:
我定义了5个模块它们都是相互依赖
def_main.py

from 电影大全 import void
from 电影大全 import program_apiclass Movie_House():def run(self):# 打印帮助文档api = program_api.help_api()api.help()#实现程序movie = void.Down_video()movie.Run()def main():m = Movie_House()m.run()if __name__ == '__main__':main()

down_video.py

import re
import requests
import os
from lxml import etree
import sys,time
from tqdm import tqdmclass Down():def __init__(self, video_name):self.video_name = video_nameself.url = "//www.6080w.com/prestrain.htmlhttps://zuidajiexi.net/m3u8.html?url=https://douban.donghongzuida.com/20201108/12160_6b904f04/index.m3u8"# self.url = input("请输入视频网址:")print("具体格式:")print("//www.6080w.com/prestrain.htmlhttps://zuidajiexi.net/m3u8.html?url=https://douban.donghongzuida.com/20201108/12160_6b904f04/index.m3u8")self.url2 = self.url.split("="[-1])[-1].split("index")[0] + "1000k/hls/index.m3u8"self.url21 = self.url.split("="[-1])[-1].split("index")[0] + "1000k/hls/"# print(url2)# print(url21)self.headers3 = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36"}def get_respones(self, url, headers):ret = requests.get(url=url, headers=headers)return ret.content.decode()def D_file(self):path = str(self.video_name)if not os.path.exists(path):os.mkdir(path)print("文件创建成功")def run(self):response = self.get_respones(self.url2, self.headers3)# print(response)res = re.findall(r'.*?ts', response)# print(res)num = 0print("视频开始下载请稍后......")start_time = time.time()for rets in tqdm(res, ncols=50):time.sleep(0.05)t = str(rets).strip()urls = self.url21 + t# print(urls, end=" ")response = requests.get(url=urls, headers=self.headers3)data = response.content# print(data)self.D_file()file_name = "./" + self.video_name + "//第{}段视频.mp4".format(num)with open(file_name, 'wb') as f:f.write(data)print("第{}段视频保存成功".format(num))num += 1end_time = time.time()print("恭喜你视频下载成功! 你花费了{}时间".format((end_time-start_time)))

program_api.py

class help_api():def help(self):print("1、运行def_main.py文件将程序运行\n""2、按照提示录入数据\n""3、直到提示出现: 请点击页面地址复制url视频接口\n""4、点击出现的链接 (最好使用chrome(谷歌浏览器))按F12 进入网页调试工具\n""5、点击Elements页面 按住Ctrl+F 进入元素查找 然后将控制台的xpath内容填写进去\n""6、复制里面的url链接 填写到url视频接口中\n""7、下载视频合成工具(推荐使用格式工厂)""8、将创建保存的文件夹下的所有片段视频进行合并")

video_api.py

import requests
from lxml import etreeclass Video():"""获取电影信息"""def __init__(self):self.query = input("请输入你要下的电影或者电视:")# query = "闪电侠"self.form_data = {"wd": self.query}self.url = "http://m.6080w.com/index.php?m=vod-search"self.headers = {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding": "gzip, deflate","Accept-Language": "zh-CN,zh;q=0.9","Cache-Control": "max-age=0","Connection": "keep-alive","Content-Length": "30","Content-Type": "application/x-www-form-urlencoded","Cookie": "__cfduid=d9734b9dde8ba7cb1e88c1ea50773d3081604822345; UM_distinctid=175a6dda4fc2f-0593ff933e9314-303464-15f900-175a6dda4fdaac; CFWztgVisitTotal_643_Cookie=8; cck_lasttime=1604903088259; cck_count=2; PHPSESSID=33ctoent9p7gmees4v57evahqr; Hm_lvt_1be9687a89f08192cf85d11531ef7b32=1604886953,1604887736,1604907659,1604908160; CNZZDATA1276411901=1025499974-1604819080-null%7C1604932558; mac_history=%7Bvideo%3A%5B%7B%22name%22%3A%22%u7784%u51C6%22%2C%22link%22%3A%22/Movie/234395.html%22%2C%22typename%22%3A%22%u79D1%u5E7B%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-07-24/159558122926.jpg%22%7D%2C%7B%22name%22%3A%22%u6CA1%u5927%u6CA1%u5C0F%22%2C%22link%22%3A%22/Movie/9415.html%22%2C%22typename%22%3A%22%u559C%u5267%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22tu.php%3Ftu%3Dr1.ykimg.com/05160000513555459792737957002CAE%22%7D%2C%7B%22name%22%3A%22%u884C%u5C38%u4E4B%u60E7%u7B2C%u516D%u5B63%22%2C%22link%22%3A%22/Movie/241403.html%22%2C%22typename%22%3A%22%u6B27%u7F8E%u5267%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-10-12/160243920319.jpg%22%7D%2C%7B%22name%22%3A%22%u6B66%u795E%u4E3B%u5BB0%22%2C%22link%22%3A%22/Movie/220270.html%22%2C%22typename%22%3A%22%u52A8%u6F2B%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-03-08/15836400263.jpg%22%7D%2C%7B%22name%22%3A%22%u7535%u51FB%u5C0F%u5B503%u4E4B%u56DB%u5723%u56E2%22%2C%22link%22%3A%22/Movie/199684.html%22%2C%22typename%22%3A%22%u52A8%u6F2B%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2019-08-03/156477877615.jpg%22%7D%2C%7B%22name%22%3A%22%u9B3C%u5439%u706F%u4E4B%u9F99%u5CAD%u8FF7%u7A9F%22%2C%22link%22%3A%22/Movie/222460.html%22%2C%22typename%22%3A%22%u5267%u60C5%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-04-03/158590081711.jpg%22%7D%2C%7B%22name%22%3A%22%u91CD%u88C5%u673A%u7532%22%2C%22link%22%3A%22/Movie/243395.html%22%2C%22typename%22%3A%22%u52A8%u4F5C%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-11-08/16048116123.jpg%22%7D%2C%7B%22name%22%3A%22%u751F%u751F%u4E16%u4E16%22%2C%22link%22%3A%22/Movie/228446.html%22%2C%22typename%22%3A%22%u6E2F%u53F0%u5267%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-06-11/159185161516.jpg%22%7D%2C%7B%22name%22%3A%22%u4E24%u5929%u4E00%u591C%u7B2C%u56DB%u5B63%22%2C%22link%22%3A%22/Movie/213564.html%22%2C%22typename%22%3A%22%u7EFC%u827A%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2019-12-10/157590726511.jpg%22%7D%2C%7B%22name%22%3A%22%u641C%u7D22%5B2020%5D%22%2C%22link%22%3A%22/Movie/241908.html%22%2C%22typename%22%3A%22%u65E5%u97E9%u5267%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-10-18/160299720137.jpg%22%7D%2C%7B%22name%22%3A%22%u9752%u7C2A%u9501%u4E09%u5343%22%2C%22link%22%3A%22/Movie/243447.html%22%2C%22typename%22%3A%22%u5267%u60C5%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-11-09/16048836021.jpg%22%7D%2C%7B%22name%22%3A%22%u6211%u5BB6%u7684%u718A%u5B69%u5B50%22%2C%22link%22%3A%22/Movie/91863.html%22%2C%22typename%22%3A%22%u7EFC%u827A%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2019-06-22/156119406926.jpg%22%7D%5D%7D; cf_clearance=56e9fbefec1664c5ebc36ffdb29e75cff9845d67-1604937518-0-1zc9693d2bz50e9279eza2074e5-150; Hm_lpvt_1be9687a89f08192cf85d11531ef7b32=1604937549","Host": "m.6080w.com","Origin": "http://m.6080w.com","Referer": "http://m.6080w.com/","Upgrade-Insecure-Requests": "1","User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36"}self.headers2 = {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding": "gzip, deflate","Accept-Language": "zh-CN,zh;q=0.9","Cache-Control": "max-age=0","Connection": "keep-alive","Cookie": "__cfduid=d9734b9dde8ba7cb1e88c1ea50773d3081604822345; UM_distinctid=175a6dda4fc2f-0593ff933e9314-303464-15f900-175a6dda4fdaac; CFWztgVisitTotal_643_Cookie=8; cck_lasttime=1604903088259; cck_count=2; PHPSESSID=33ctoent9p7gmees4v57evahqr; Hm_lvt_1be9687a89f08192cf85d11531ef7b32=1604886953,1604887736,1604907659,1604908160; CNZZDATA1276411901=1025499974-1604819080-null%7C1604932558; mac_history=%7Bvideo%3A%5B%7B%22name%22%3A%22%u7784%u51C6%22%2C%22link%22%3A%22/Movie/234395.html%22%2C%22typename%22%3A%22%u79D1%u5E7B%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-07-24/159558122926.jpg%22%7D%2C%7B%22name%22%3A%22%u6CA1%u5927%u6CA1%u5C0F%22%2C%22link%22%3A%22/Movie/9415.html%22%2C%22typename%22%3A%22%u559C%u5267%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22tu.php%3Ftu%3Dr1.ykimg.com/05160000513555459792737957002CAE%22%7D%2C%7B%22name%22%3A%22%u884C%u5C38%u4E4B%u60E7%u7B2C%u516D%u5B63%22%2C%22link%22%3A%22/Movie/241403.html%22%2C%22typename%22%3A%22%u6B27%u7F8E%u5267%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-10-12/160243920319.jpg%22%7D%2C%7B%22name%22%3A%22%u6B66%u795E%u4E3B%u5BB0%22%2C%22link%22%3A%22/Movie/220270.html%22%2C%22typename%22%3A%22%u52A8%u6F2B%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-03-08/15836400263.jpg%22%7D%2C%7B%22name%22%3A%22%u7535%u51FB%u5C0F%u5B503%u4E4B%u56DB%u5723%u56E2%22%2C%22link%22%3A%22/Movie/199684.html%22%2C%22typename%22%3A%22%u52A8%u6F2B%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2019-08-03/156477877615.jpg%22%7D%2C%7B%22name%22%3A%22%u9B3C%u5439%u706F%u4E4B%u9F99%u5CAD%u8FF7%u7A9F%22%2C%22link%22%3A%22/Movie/222460.html%22%2C%22typename%22%3A%22%u5267%u60C5%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-04-03/158590081711.jpg%22%7D%2C%7B%22name%22%3A%22%u91CD%u88C5%u673A%u7532%22%2C%22link%22%3A%22/Movie/243395.html%22%2C%22typename%22%3A%22%u52A8%u4F5C%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-11-08/16048116123.jpg%22%7D%2C%7B%22name%22%3A%22%u751F%u751F%u4E16%u4E16%22%2C%22link%22%3A%22/Movie/228446.html%22%2C%22typename%22%3A%22%u6E2F%u53F0%u5267%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-06-11/159185161516.jpg%22%7D%2C%7B%22name%22%3A%22%u4E24%u5929%u4E00%u591C%u7B2C%u56DB%u5B63%22%2C%22link%22%3A%22/Movie/213564.html%22%2C%22typename%22%3A%22%u7EFC%u827A%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2019-12-10/157590726511.jpg%22%7D%2C%7B%22name%22%3A%22%u641C%u7D22%5B2020%5D%22%2C%22link%22%3A%22/Movie/241908.html%22%2C%22typename%22%3A%22%u65E5%u97E9%u5267%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-10-18/160299720137.jpg%22%7D%2C%7B%22name%22%3A%22%u9752%u7C2A%u9501%u4E09%u5343%22%2C%22link%22%3A%22/Movie/243447.html%22%2C%22typename%22%3A%22%u5267%u60C5%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-11-09/16048836021.jpg%22%7D%2C%7B%22name%22%3A%22%u6211%u5BB6%u7684%u718A%u5B69%u5B50%22%2C%22link%22%3A%22/Movie/91863.html%22%2C%22typename%22%3A%22%u7EFC%u827A%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2019-06-22/156119406926.jpg%22%7D%5D%7D; cf_clearance=56e9fbefec1664c5ebc36ffdb29e75cff9845d67-1604937518-0-1zc9693d2bz50e9279eza2074e5-150; Hm_lpvt_1be9687a89f08192cf85d11531ef7b32=1604937754","Host": "m.6080w.com","Referer": "http://m.6080w.com/index.php?m=vod-search","Upgrade-Insecure-Requests": "1","User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36"}self.headers3 = {"Cookie": "__cfduid=d9734b9dde8ba7cb1e88c1ea50773d3081604822345; UM_distinctid=175a6dda4fc2f-0593ff933e9314-303464-15f900-175a6dda4fdaac; CFWztgFirstShowTime_643_Cookie=2020-11-8%2023%3A9%3A48; CFWztgVisitTotal_643_Cookie=7; cck_count=1; cck_lasttime=1604858787429; mac_history=%7Bvideo%3A%5B%7B%22name%22%3A%22%u95EA%u7535%u4FA0%u7B2C%u4E94%u5B63%22%2C%22link%22%3A%22/Movie/139107.html%22%2C%22typename%22%3A%22%u6B27%u7F8E%u5267%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2018-11-28/154340757520.jpg%22%7D%2C%7B%22name%22%3A%22%u91CD%u88C5%u673A%u7532%22%2C%22link%22%3A%22/Movie/243410.html%22%2C%22typename%22%3A%22%u79D1%u5E7B%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-11-08/160482961924.jpg%22%7D%2C%7B%22name%22%3A%22%u6F2B%u5A01%u82F1%u96C4%uFF1A%u795E%u5947%u641E%u7B11%22%2C%22link%22%3A%22/Movie/233192.html%22%2C%22typename%22%3A%22%u52A8%u6F2B%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-07-13/159462720510.jpg%22%7D%2C%7B%22name%22%3A%22%u9B3C%u5439%u706F%u4E4B%u9F99%u5CAD%u8FF7%u7A9F%22%2C%22link%22%3A%22/Movie/222460.html%22%2C%22typename%22%3A%22%u5267%u60C5%u7247%22%2C%22typelink%22%3A%22/vodlist/-1.html%22%2C%22pic%22%3A%22upload/vod/2020-04-03/158590081711.jpg%22%7D%5D%7D; CNZZDATA1276411901=1025499974-1604819080-null%7C1604862301; cf_clearance=2326376d4521dcf683859b555eca275817ed2fef-1604863264-0-1zc9693d2bz50e9279eza2074e5-150; Hm_lvt_1be9687a89f08192cf85d11531ef7b32=1604863389,1604863667,1604863812,1604863960; PHPSESSID=8qrr0ovqf2qt86gj70glcphvm1; Hm_lpvt_1be9687a89f08192cf85d11531ef7b32=1604864668","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36"}def Get_Response(self, url, headers, code=True):"""发送请求获取响应"""if code:response = requests.post(url=url, data=self.form_data ,headers=headers)return response.content.decode()else:response = requests.get(url=url, data=self.form_data, headers=headers)return response.content.decode()def Wash_Html_Data(self, srt_html):"""清洗数据"""html = etree.HTML(srt_html)# 对页面进行分组# div = html.xpath("//div[@class='container']/div[@class='row']/div/div/div")div = html.xpath("//div[@class='container']/div[@class='row']/div/div/div")self.video_list = []self.url_list = []self.video_url = []for div_tmp in div:div_dict = {}# 提取电影名称div_dict['video_name'] = div_tmp.xpath("./div/dl/dd/div[@class='head']/h3/text()")div_dict['video_name'] = div_dict['video_name'][0] if len(div_dict['video_name']) > 0 else None# 提取封面里的详细的地址video_url = div_tmp.xpath("./div/dl/dt/a/@href")video_url = "http://m.6080w.com/" + video_url[0] if len(video_url) > 0 else Noneself.video_url.append(video_url)div_dict['video_url'] = div_tmp.xpath("./div/dl/dt/a/@href")div_dict['video_url'] = "http://m.6080w.com/" + div_dict['video_url'][0] if len(div_dict['video_url']) > 0 else None# 提取电影封面地址div_dict['video_img_url'] = div_tmp.xpath("./div/dl/dt/a/@style")div_dict['video_img_url'] = div_dict['video_img_url'][0].split("(")[-1].split(")")[0] if len(div_dict['video_img_url']) > 0 else None# 提取电影评分div_dict['video_score'] = div_tmp.xpath("./div/dl/dd/div/span/text()")div_dict['video_score'] = div_dict['video_score'][0] if len(div_dict['video_score']) > 0 else None# 提取电影主演div_dict['video_header'] = div_tmp.xpath("./div/dl/dd/ul/li/text()")div_dict['video_header'] = div_dict['video_header'][0] if len(div_dict['video_header']) > 0 else None# 提取电影导演# 提取电影地区# 提取电影类型# 提取电影语言# 提取电影年份self.video_list.append(div_dict)# 提取下一页的 url地址# http://m.6080w.com/next = html.xpath("/html/body/div[1]/div/div[1]/div/div[22]/a[2]/@href")nex_url = "http://m.6080w.com/" + next[0] if len(next) > 0 else Noneself.url_list.append(nex_url)# print(url_list)return self.video_list, self.url_list, self.video_urldef Print_Data(self, video_list):"""打印可视视频数据"""num = 1for data in video_list:print()print("第{}部:".format(num))print("电影名称:{}\n详细地址:{}\n电影封面地址:{}\n电影评分:{}\n电影主演:{}".format(data['video_name'], data['video_url'],data['video_img_url'], data['video_score'], data['video_header']))print("#"*50)print()num += 1

void.py

from 电影大全 import video_api
from lxml import etree
import re
from  电影大全 import down_videoclass Down_video(video_api.Video):def url_video(self, url):for tmp in url:self.video_in_urls.append(tmp)def wash_html(self, str_html):"""清洗数据"""html = etree.HTML(str_html)html_div = html.xpath("//div[@id='playlist1']/ul/li") # 分组video_name_list = []video_name = {}dict_list = []for div in html_div:div_dict = {}# 获取页码数div_dict['video_code_number'] = div.xpath("./a/text()")div_dict['video_code_number'] = div_dict['video_code_number'][0] if len(div_dict['video_code_number']) > 0 else None# 获取对应的url地址div_dict['video_url'] = div.xpath("./a/@href")div_dict['video_url'] = "http://m.6080w.com/" + div_dict['video_url'][0] if len(div_dict) > 0 else Nonedict_list.append(div_dict)# 获取片名video_name['video_name'] = html.xpath("//div[@class='hy-video-details clearfix']/div/dl/dd/div/h3/text()")video_name['video_name'] = video_name['video_name'][0] if len(video_name['video_name']) > 0 else None# 获取片名信息video_name['video_data'] = html.xpath("//div[@class='hy-video-details clearfix']/div/dl/dd/ul/li/text()")video_name['video_data'] = video_name['video_data'][0] if len(video_name['video_data']) > 0 else Nonevideo_name_list.append(video_name)return  dict_list, video_name_listdef inedx_html(self, str_html):print(str_html)html = etree.HTML(str_html)# 获取tsplay_url = html.xpath("//div[@class='container']/div/div/div/div/div/table/tbody/tr/td/iframe/@src")print(play_url)# https://youku.cdn3-okzy.com/20191209/4820_5020c357/1000k/hls/index.m3u8play_url = play_url.split("="[-1])[-1].split("index")[0] + "1000k/hls/index.m3u8"video_url = play_url.split("="[-1])[-1].split("index")[0] + "1000k/hls/"response = self.Get_Response(url=play_url, headers=self.headers3, code=False)res = re.findall(r'c.*?ts', response)for ts in res:url = video_url+tsprint(url)def print_wash_data(self, wash_dict_list, wash_video_name_list):"""打印清洗数据"""url = []for data in wash_dict_list:print()print("片名:{}\n片名信息:{}\n集数:{}\n链接:{}".format(wash_video_name_list[0]['video_name'],wash_video_name_list[0]['video_data'] ,data['video_code_number'], data['video_url']))# print("url=====", data['video_url'])url.append(data['video_url'])print("#" * 50)print()page = int(input("请输入你要下载的视频的集数:"))url.reverse()urls = url[page-1]print("请点击页面地址复制url视频接口")print("本集/本电影的url地址如下\n:{}".format(urls))print("xpath://div[@class='container']/div/div/div/div/div/table/tbody/tr/td/iframe/@src")down = down_video.Down(self.query)down.run()def content(self, url):"""获取url发送请求获取响应"""str_html = self.Get_Response(url=url, headers=self.headers2)wash_dict_list, wash_video_name_list = self.wash_html(str_html)self.print_wash_data(wash_dict_list=wash_dict_list, wash_video_name_list=wash_video_name_list)def Run(self):url = self.urlprint("正在爬取视频数据, 请稍候...")self.video_in_urls = []while url != None:# get responsestr_html = self.Get_Response(url=url, headers=self.headers)# print(str_html)video_lists, next_video_url, video_urls = self.Wash_Html_Data(str_html)# print(next_video_url)self.url_video(url=video_urls)# 打印视频数据self.Print_Data(video_list=video_lists)num = 0url = next_video_url[num]num += 1mv = int(input("请选择你要下载的影片:"))url = self.video_in_urls[mv-1]self.content(url=url)

提取成功:

合并成功:

播放成功:

python爬虫下载影视网站的电影相关推荐

  1. python爬虫 下载视频网站视频

    python爬虫 下载视频网站视频 xpath解析页面源码 requests.Session() 解决 status_code 302 网页重定向 selenium 获取网页遇到 iframe 标签解 ...

  2. python爬虫——三步爬得电影天堂电影下载链接,30多行代码即可搞定:

    python爬虫--三步爬得电影天堂电影下载链接,30多行代码即可搞定: 本次我们选择的爬虫对象是:https://www.dy2018.com/index.html 具体的三个步骤:1.定位到202 ...

  3. HTML5期末大作业:电影影视网站设计——电影介绍(11页) 学生HTML个人网页作业作品下载 个人电影影视网页设计制作 大学生个人电影影视网站作业模板 简单个人电影影视

    HTML5期末大作业:电影影视网站设计--电影介绍(11页) 学生HTML个人网页作业作品下载 个人电影影视网页设计制作 大学生个人电影影视网站作业模板 简单个人电影影视 常见网页设计作业题材有 个人 ...

  4. 25TML5期末大作业:影视网站设计——电影请以你的名字呼唤我(4页) 大学生简单个人静态HTML网页设计作品 DIY布局个人介绍网页模板代码 DY学生个人网站制作成品下载

    HTML5期末大作业:影视网站设计--电影请以你的名字呼唤我(4页) 大学生简单个人静态HTML网页设计作品 DIY布局个人介绍网页模板代码 DY学生个人网站制作成品下载 常见网页设计作业题材有 个人 ...

  5. TML5期末大作业:影视网站设计——电影请以你的名字呼唤我(4页) 大学生简单个人静态HTML网页设计作品 DIY布局个人介绍网页模板代码 DY学生个人网站制作成品下载

    HTML5期末大作业:影视网站设计--电影请以你的名字呼唤我(4页) 大学生简单个人静态HTML网页设计作品 DIY布局个人介绍网页模板代码 DY学生个人网站制作成品下载 常见网页设计作业题材有 个人 ...

  6. python爬虫之小说网站--下载小说(正则表达式)

    python爬虫之小说网站--下载小说(正则表达式) 思路: 1.找到要下载的小说首页,打开网页源代码进行分析(例:https://www.kanunu8.com/files/old/2011/244 ...

  7. 23HTML5期末大作业:电影影视网站设计——电影介绍(11页) 学生HTML个人网页作业作品下载 个人电影影视网页设计制作 大学生个人电影影视网站作业模板 简单个人电影影视

    HTML5期末大作业:电影影视网站设计--电影介绍(11页) 学生HTML个人网页作业作品下载 个人电影影视网页设计制作 大学生个人电影影视网站作业模板 简单个人电影影视 常见网页设计作业题材有 个人 ...

  8. python爬虫实例教程之豆瓣电影排行榜--python爬虫requests库

    我们通过requests库进行了简单的网页采集和百度翻译的操作,这一节课我们继续进行案例的讲解–python爬虫实例教程之豆瓣电影排行榜,这次的案例与上节课案例相似,同样会涉及到JSON模块,异步加载 ...

  9. Python爬虫实例 wallhaven网站高清壁纸爬取。

    文章目录 Python爬虫实例 wallhaven网站高清壁纸爬取 一.数据请求 1.分析网页源码 2.全网页获取 二.数据处理 1.提取原图所在网页链接 2.获取高清图片地址及title 三.下载图 ...

最新文章

  1. Leetcode: Remove Element
  2. 纯css用图片代替checkbox和radio,无js实现方法
  3. Android应用自动更新功能的代码实现
  4. 数据中台离数据资产“价值变现”还有多远?
  5. Python(9):函数
  6. twilio_15分钟内使用Twilio和Stormpath在Spring Boot中进行身份管理
  7. stm32超声波测距代码_超声波模块另类用法,悬浮,你也能做到
  8. 计算机应用全能,全能计算助手
  9. win7系统如何开启udma功能
  10. java 设置组件填充_Java学习笔记(三)Java2D组件
  11. 软考笔记第九天之多媒体基础
  12. 计算机工程与工艺年会2019,计算机工程与工艺
  13. 部署VC2008应用程序
  14. 解决hive表中comment中文乱码问题
  15. 在excel中如何筛选重复数据_Excel快速筛选数据方法集锦
  16. Intel Core系列CPU架构演变
  17. Tracy JS 小笔记 - 数据结构 栈,队列,链表,字典,集合,哈希表(散列表)
  18. webdav同步书签-floccus
  19. 百度AI开发者语音转文字python实现
  20. @GetMapping注解的理解

热门文章

  1. rm / -rf 指令的作用
  2. 字符串方法intern()详解
  3. [CVPR2020]Learning to Cartoonize Using White-box Cartoon Representations
  4. 这个发热鞋垫厉害了,有它冬天再也不怕脚冷
  5. 新华保险公司怎么样?
  6. 【转载】浅谈米之思想
  7. 什么是IPv6?它有什么特别之处?
  8. 一个相见恨晚的学习网站,全1080P视频教学!
  9. 什么是机器学习?从3个视角谈起:学习任务、学习范式、学习模型
  10. 发现生活中的肖特基二极管