python 网页爬虫，多任务下载视频

网上找个网站，视频手动一个一个下载，太麻烦了，怎么办？用某雷，out了
网页爬虫多线程下载视频步骤：

引入requests 访问网页内容，用正则解析提取url
分析html页得到mp4地址
把url存入线程安全的queue
多线程获到queue内的mp4地址，同时下载

import requests
import re
import os
import queue
import threading
import shutildef download_start():download_url_queue = queue.Queue(3)mp4_code_set = set()page = 10store_location = '/Users/Downloads/.dyxx/'    #存储的地址download_site_home = "https://xxxxxx.com/"   #下载视的地址，这个需要你自己到网上发掘了mp4_api_url = 'https://api.xxxxxx.com/get-mp4-url?code='  #通过下载片源地址获取code, 通过code获得播放mp4的地址def download():while True:if not download_url_queue.empty():mp4_url = download_url_queue.get()try:file_path = store_location + mp4_url[-15:]if not os.path.exists(file_path):print('Download start::::' + mp4_url)res_header = requests.head(mp4_url)if res_header.headers['Content-Type'] == 'video/mp4':with open(file_path, "wb") as f, requests.get(mp4_url, stream=True) as res:shutil.copyfileobj(res.raw, f)print('Download end::::' + mp4_url)except Exception as ee:print(str(ee))passfor t in range(5):threading.Thread(target=download).start()while True:try:download_pages = download_site_home+'?page=' + str(page)res = requests.get(download_pages)if res.status_code == 200:re_href = re.compile(r'href="/\d{4}/[^"]*')all_href = re_href.findall(res.text)all_href.reverse()all_href_set = set(all_href[15:-15])for href_item in all_href_set:play_page = download_site_home + href_item.replace('href="/', '')play_page_res = requests.get(play_page)if play_page_res.status_code == 200:play_page_text = play_page_res.textre_play_code = re.compile(r'data-code="[^"]*')mp4_play_codes = re_play_code.findall(play_page_text)mp4_play_codes_set = set(mp4_play_codes)for code in mp4_play_codes_set:param_code = code.replace('data-code="', '')if param_code in mp4_code_set:breakelse:mp4_code_set.add(param_code)mp4url = mp4_api_url + param_codemp4res = requests.get(mp4url)if mp4res.status_code == 200:file_path = store_location + mp4res.text[-15:]if os.path.exists(file_path):break;print(mp4res.text + '   ' + param_code)download_url_queue.put(mp4res.text)page = page + 1except Exception as e:passif __name__ == '__main__':download_start()

python 网页爬虫，多任务下载视频相关推荐

简单python网络爬虫批量下载视频
寒假闲来无事,决定尝试一下用python写一个小网络爬虫批量下载视频. 由于是第一次写网络爬虫,可以说是两眼一抹黑,整个程序都是自己一点点试出来的,所以程序本身肯定有一些漏洞和缺陷,如果有建议请批评指 ...
Python 网页爬虫文本处理科学计算机器学习数据挖掘兵器谱 - 数客
曾经因为NLTK的缘故开始学习Python,之后渐渐成为我工作中的第一辅助脚本语言,虽然开发语言是C/C++,但平时的很多文本数据处理任务都交给了Python.离开腾讯创业后,第一个作品课程图谱也是选 ...
Python 网页爬虫文本处理科学计算机器学习数据挖掘兵器谱
Python 网页爬虫 & 文本处理 & 科学计算 & 机器学习 & 数据挖掘兵器谱 2015-04-27 程序猿程序猿来自:我爱自然语言处理,www.52nlp. ...
python网页爬虫-Python网页爬虫
曾经因为NLTK的缘故开始学习Python,之后渐渐成为我工作中的第一辅助脚本语言,虽然开发语言是C/C++,但平时的很多文本数据处理任务都交给了Python.离开腾讯创业后,第一个作品课程图谱也是选 ...
python网络爬虫快速下载4K高清壁纸
python网络爬虫快速下载4K高清壁纸此处给出下载壁纸的链接地址彼岸图网,进入网站之后,我们看到可以下载风景,游戏,动漫,美女等类型的4K图片,装逼一下,re库有贪婪匹配,那我们就写一个通用代码来 ...
python网页爬虫+简单的数据分析
python网页爬虫+简单的数据分析文章目录 python网页爬虫+简单的数据分析一.数据爬取二.数据分析 1.我们今天爬取的目标网站是:http://pm25.in/ 2.需要爬取的目标数据是 ...
python网页爬虫-python网页爬虫浅析
Python网页爬虫简介: 有时候我们需要把一个网页的图片copy 下来.通常手工的方式是鼠标右键 save picture as ... python 网页爬虫可以一次性把所有图片copy 下来. ...
Python之爬虫-酷6视频
Python之爬虫-酷6视频 import re import requests""" @author RansySun @create 2019-07-20-19:00 ...
python 网页爬虫作业调度_第3次作业-MOOC学习笔记：Python网络爬虫与信息提取
1.注册中国大学MOOC 2.选择北京理工大学嵩天老师的<Python网络爬虫与信息提取>MOOC课程 3.学习完成第0周至第4周的课程内容,并完成各周作业. 4.提供图片或网站显示的学习 ...

python 网页爬虫，多任务下载视频

python 网页爬虫，多任务下载视频相关推荐

最新文章

热门文章