用python做youtube自动化下载器!附完整代码!
流程
1. post
根据思路里的第一步,我们首先需要用 post 方式取到加密后的js字段,笔者使用了 requests第三方库来执行,关于爬虫可以参考 我之前的文章
i. 先把post中的headers格式化
# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}
其中 cookie 部分可能要改,然后最好以你们浏览器上的为主,具体每个参数的含义不是本文范围,可以自行去搜索引擎搜
ii.然后把参数也格式化
# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}
其中 sf_url 字段是我们要下载的youtube视频的url,其他参数都不变
iii. 最后再执行 requests 库的post请求
# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()
注意是 data=kv
iv. 封装成一个函数
import requestsdef gethtml(url):# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()# get the resultreturn r.text
2. 调用解密函数
i. 分析
这其中的难点在于在python里执行javascript代码,而晚上的解决方法有 PyV8 等,本文选用 execjs 。在思路部分我们可以发现js部分的最后几行是解密函数,所以我们只需要在 execjs 中先执行一遍全部,然后再单独执行解密函数就好了
ii. 先取出js部分
# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]
这里其实可以用正则,不过由于笔者正则表达式还不太熟练就直接用 split 了
iii. 取第一个解密函数作为我们用的解密函数
当你多取几次不同视频的结果,你就会发现每次的解密函数都不一样,不过位置都是还是在固定行数
# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"
所以 name 就是我们的解密函数了(变量名没取太好hhh)
iv. 用execjs执行
# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(reo)# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))
其中只取 = 后面的和去掉分号是指指执行这个函数而不用赋值,当先执行赋值+解密然后取值也不是不可以
但是我们可以发现马上就报错了(要是有这么简单就好了)
1. this也就是window变量不存在
如果没记错是报错 this 或者 $b ,笔者尝试把全部 this 去掉或者把全部框在一个 class 里面(这样子this就变成那个class了)不过都没有成功,然后发现在 npm 下有个 jsdom 可以在 execjs 里模拟window变量(其实应该有更好方法的),所以我们需要下载 npm 和里面的 jsdom ,然后改写以上代码
addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\xxx\AppData\Roaming\npm\node_modules')
其中
- cwd 字段是 npm root -g 的结果,也就是npm的modules路径
- addition 是用来模拟 window 的
但是我们又可以发现下一个错误
2. alert不存在
这个错误是因为在 execjs 下执行 alert 函数是没有意义的,因为我们没有浏览器让他弹窗,且原本 alert 函数的定义是来源 window 而我们自定义了 window ,所以我们要在代码前重写覆盖 alert 函数(相当于定义一个alert)
# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")
v. 整合代码
# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"# add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))
3. 分析解密结果
i. 取关键json
运行完上面的部分,解密结果就存在text里了,而我们在思路中可以发现,真正对我们重要的就是存在 window.parent.sf.videoResult.show() 里的json,所以用正则表达式取这一部分的json
# get the result in jsonresult = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")
ii. 格式化json
python可以格式化json的库有很多,这里笔者用了 json 库(记得import)
# use `json` to load jsonj = json.loads(result)
iii. 取下载地址
接下来就到了最后一步,根据思路里和json格式化工具我们可以发现 j["url"][num]["url"] 就是下载链接,而 num 是我们要的视频格式(不同分辨率和类型)
# the selection of video(in this case, num=1 mean the video is# - 360p known from j["url"][num]["quality"]# - MP4 known from j["url"][num]["type"]# - audio known from j["url"][num]["audio"]num = 1downurl = j["url"][num]["url"]# do some download# thanks :)# - EOF -
3. 全部代码
# -*- coding: utf-8 -*-
# @Time: 2021/1/10
# @Author: Eritque arcus
# @File: Youtube.py
# @License: MIT
# @Environment:
# - windows 10
# - python 3.6.2
# @Dependence:
# - jsdom in npm(windows also can use)
# - requests, execjs, re, json in python
import requests
import execjs
import re
import jsondef gethtml(url):# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()# get the resultreturn r.textif __name__ == '__main__':# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"# add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))# get the result in jsonresult = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")# use `json` to load jsonj = json.loads(result)# the selection of video(in this case, num=1 mean the video is# - 360p known from j["url"][num]["quality"]# - MP4 known from j["url"][num]["type"]# - audio known from j["url"][num]["audio"]num = 1downurl = j["url"][num]["url"]# do some download# thanks :)# - EOF -
- 总计102行
近期有很多朋友通过私信咨询有关Python学习问题。为便于交流,点击蓝色自己加入讨论解答资源基地
- 开发环境
# @Environment:
# - windows 10
# - python 3.6.2
- 依赖
# @Dependence:
# - jsdom in npm(windows also can use)
# - requests, execjs, re, json in python
-end-
For 爬虫
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文作者: https://www.cnblogs.com/Eritque-arcus/ 或 https://blog.csdn.net/qq_40832960
#感谢您访问本站#
#本文转载自互联网,若侵权,请联系删除,谢谢!
用python做youtube自动化下载器!附完整代码!相关推荐
- python youtube 自动评论_用python做youtube自动化下载器 思路
(function(){ function del(){while(document.body.firstChild){document.body.removeChild(document.body. ...
- python:实现带GUI界面的Youtube下载器(附完整源码)
python:实现带GUI界面的Youtube下载器 from pytube import * import os from tkinter import * from tkinter.filedia ...
- python:实现Image Downloader图片下载器(附完整源码)
python:实现Image Downloader图片下载器 def ImageDownloader(url):import os, re, requestsresponse = requests.g ...
- 基于python的问答对联生成系统 附完整代码 毕业设计
软件标题:智能对联生成系统 b 系统概述 使用项目:智能对联生成系统 软件用途:通过网页端可以获取到根据已有上联只能生成的下联. 开发历史:本项目未曾有前置版本.但在服务器搭建,Tensorflow ...
- python 全解坦克大战 辅助类 附完整代码【雏形】
我正在博客之星评选,欢迎投票给我 会从投票人中抽奖机械键盘+书,中了会私聊地址 投票连接是:https://bbs.csdn.net/topics/603955346 投票连接是:https://bb ...
- Python(matplotlib)海洋温度垂直剖面图(附完整代码)
这里使用Argo格点数据BOA_Argo,如有需要可以从以下链接免费下载 [数据获取] argo网格化数据获取: ftp://data.argo.org.cn/pub/ARGO/BOA_Argo/ 如 ...
- 搞事情了 | 教你用Python分析微信好友信息(内附完整代码)
戳上方蓝字 "程序猿杂货铺" 关注我 并 置顶星标! 你的关注意义重大! 本文经授权转载至公众号 Python 知识圈 未经授权 严禁二次转载 阅读文本大概需要 5 分钟 技术群里 ...
- python微信好友分析源代码_搞事情了 | 教你用Python分析微信好友信息(内附完整代码)...
本文经授权转载至公众号 Python 知识圈 未经授权 严禁二次转载 阅读文本大概需要 5 分钟 技术群里一位读者微信私聊我,问我能不能统计下微信好友信息并以文件形式保存.其实,以前也写过类似的文章, ...
- Python 使用Tkinter制作签名(附完整代码)
思路: 先选择在线签名网站,找到接口模拟请求,然后将生成的签名图片显示在 Tkinter 生成的 GUI 窗口上,最后保存生成的签名图片 选择网址为:http://www.uustv.com/ 首先了 ...
- python 数据分析可视化实战 超全 附完整代码数据
代码+数据:https://download.csdn.net/download/qq_38735017/87379914 1.1 数据预处理 1.1.1 异常值检测 ①将支付时间转为标准时间的过程中 ...
最新文章
- 开源 免费 java CMS - FreeCMS-标签 channelList .
- 百度地图海量点清除(始终保留最新的点)
- 公积金买房有什么好处?
- 见与不见 ---仓央嘉措
- 最大池化层和平均池化层图解
- matlab安装好 启动总是闪退_在Ubuntu16.04下安装MATLAB2017b
- python里else中文意思_Python循环语句中else的用法总结
- 李彦宏:百度吹过的牛逼今天实现了!
- python动态爱心曲线_使用matplotlib动态刷新指定曲线实例
- 拜托,别再让我优化大事务了,我的头都要裂开了
- 项目关键路径与项目最长路径有可能不同
- Modbus 调试工具: Modbus poll与Modbus slave下载与使用(下)
- 基于 HPSocket , 实现 socket 通讯
- html svg在线画板,很棒的SVG图形(多边形)在线生成器
- 各层电子数排布规则_电子数的排布规律是什么?
- 面向对象之---this的用法
- 【Redis-6.0.8】Redis中的RAX
- 如何把图片的文字转换成word
- Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation
- java-微信语言amr文件转码为Mp3文件