python爬app西瓜视频_Python爬虫下载西瓜视频

Python爬虫下载西瓜视频

1、简介

由于西瓜视频免费，全站视频都可以下载，需要指定详情的视频链接，默认720P，电影内存1G多

2、Python

清单文件

requests==2.21.0

lxml==4.3.0

3、代码

from base64 import b64decode

from lxml import etree

import requests

import json

import re

import os

class XiGuaSpider:

def __init__(self):

self.headers = {

'Referer': 'https://www.ixigua.com',

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',

'cookie': 'wafid=5b014d14-4285-413a-9dac-80e467ad5b4e; wafid.sig=Coe4SV6gStmKvfg897vmEd6h4_k; ttwid=6827754753289045508; ttwid.sig=hZChEHZDh7I1GdyST_waUAu31MA; xiguavideopcwebid=6827754753289045508; xiguavideopcwebid.sig=PPinOOHLyRkB7vLZeARYw7faelQ; SLARDAR_WEB_ID=91892236-9025-4d06-9d8e-1ec85233c784; _ga=GA1.2.15757958.1589710551; ixigua-a-s=1; _gid=GA1.2.1010024108.1589876453; s_v_web_id=kadne2g1_M7vfty8L_ecKr_47jB_8GWc_ctMlrqAXOgQy; _gat_gtag_UA_138710293_1=1',

}

self.video_dirs = './video'

def download_file(self, file_path, download_url):

print('*' * 100)

print(f"保存路径：{file_path}")

print(f'下载URL：{download_url}')

response = requests.get(url=download_url, headers=self.headers, stream=True)

content_size = int(response.headers["content-length"]) # 视频内容的总大小

size = 0

with open(file_path, "wb") as file: # 非纯文本都以字节的方式写入

for data in response.iter_content(chunk_size=1024): # 循环写入

file.write(data) # 写入视频文件

file.flush() # 刷新缓存

size += len(data) # 叠加每次写入的大小

# 打印下载进度

print("\r文件下载进度:%d%%(%0.2fMB/%0.2fMB)" % (

float(size / content_size * 100), (size / 1024 / 1024),

(content_size / 1024 / 1024)),

end=" ")

print()

def get_response(self, url):

response = None

try:

response = requests.get(url, headers=self.headers)

except Exception as e:

print(e)

return response

def parse_detail(self, url):

response = self.get_response(url)

if not response:

return

html = response.text

document = etree.HTML(html)

title = ''.join(document.xpath('//*[@class="hasSource"]/text()'))

if not title:

title = ''.join(document.xpath('//*[@class="teleplayPage__Description__header"]/h1/text()'))

title = re.sub(u"([^\u4e00-\u9fa5\u0030-\u0039\u0041-\u005a\u0061-\u007a])", "-", title)

pattern = r'\

python爬app西瓜视频_Python爬虫下载西瓜视频相关推荐

python爬app无水印视频_Python爬虫：短视频平台无水印下载（上）
导入: 虽然目前有些软件还没适配,但是,我发了 Blink 后有一写人留言或者私信找我要源码,不过我还在增加适配的软件,所以还没有时间写这篇博客,今天呢,就先把我目前适配了的代码拿出来,后续还会继续适 ...
python爬取收费素材_Python爬虫练习：爬取素材网站数据
前言本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理. 在工作中的电子文案.ppt,生活中的新闻.广告,都离不开大量的素材,而素材 ...
python爬取bilibili弹幕_Python爬虫爬取Bilibili弹幕过程解析
先来思考一个问题,B站一个视频的弹幕最多会有多少? 比较多的会有2000条吧,这么多数据,B站肯定是不会直接把弹幕和这个视频绑在一起的. 也就是说,有一个视频地址为https://www.bilibi ...
python批量下载bilibili视频_python 批量下载bilibili视频的gui程序
运行效果: 完整代码: # !/usr/bin/python # -*- coding:utf-8 -*- # time: 2019/07/02--08:12 __author__ = 'Henry' ...
python爬取豆瓣小组_Python 爬虫实例+爬取豆瓣小组 + wordcloud 制作词云图
目标利用PYTHON爬取如下图中所有回答的内容,并且制作词云图. 用到的库 import requests # import json from PIL import Image from pyqu ...
python爬取豆瓣书籍_python爬虫学习，爬取豆瓣各分类书单
点击蓝字"python教程"关注我们哟! 代码展示:pachon2.5.py # -- coding: utf-8 -- import urllib import urllib2 ...
python爬取控制台信息_python爬虫实战之爬取智联职位信息和博客文章信息
1.python爬取招聘信息简单爬取智联招聘职位信息 # !/usr/bin/env python # -*-coding:utf-8-*- """ @Author ...
python爬虫源码下载视频_Python爬虫下载视频文件部分源码
importrequestsimporttime headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebK ...
python爬虫下载模块_python爬虫——下载ted视频
鄙人长期知乎潜水,这是我的第一篇知乎文章,如有不好的地方请多指教自学爬虫一个月有余,又是一个英语学习爱好者,突然心血来朝想去ted上面看下如何爬视频 1.所用工具 requests模块 --爬虫核心 ...

python爬app西瓜视频_Python爬虫下载西瓜视频

python爬app西瓜视频_Python爬虫下载西瓜视频相关推荐

最新文章

热门文章