流程

1. post

根据思路里的第一步，我们首先需要用 post 方式取到加密后的js字段，笔者使用了 requests第三方库来执行，关于爬虫可以参考我之前的文章

i. 先把post中的headers格式化

# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}

其中 cookie 部分可能要改，然后最好以你们浏览器上的为主，具体每个参数的含义不是本文范围，可以自行去搜索引擎搜

ii.然后把参数也格式化

# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}

其中 sf_url 字段是我们要下载的youtube视频的url，其他参数都不变

iii. 最后再执行 requests 库的post请求

# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()

注意是 data=kv

iv. 封装成一个函数

import requestsdef gethtml(url):# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()# get the resultreturn r.text

2. 调用解密函数

i. 分析

这其中的难点在于在python里执行javascript代码，而晚上的解决方法有 PyV8 等，本文选用 execjs 。在思路部分我们可以发现js部分的最后几行是解密函数，所以我们只需要在 execjs 中先执行一遍全部，然后再单独执行解密函数就好了

ii. 先取出js部分

# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]

这里其实可以用正则，不过由于笔者正则表达式还不太熟练就直接用 split 了

iii. 取第一个解密函数作为我们用的解密函数

当你多取几次不同视频的结果，你就会发现每次的解密函数都不一样，不过位置都是还是在固定行数

# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"

所以 name 就是我们的解密函数了(变量名没取太好hhh)

iv. 用execjs执行

# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(reo)# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))

其中只取 = 后面的和去掉分号是指指执行这个函数而不用赋值，当先执行赋值+解密然后取值也不是不可以

但是我们可以发现马上就报错了(要是有这么简单就好了)

1. this也就是window变量不存在

如果没记错是报错 this 或者 $b ，笔者尝试把全部 this 去掉或者把全部框在一个 class 里面(这样子this就变成那个class了)不过都没有成功，然后发现在 npm 下有个 jsdom 可以在 execjs 里模拟window变量(其实应该有更好方法的)，所以我们需要下载 npm 和里面的 jsdom ，然后改写以上代码

addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\xxx\AppData\Roaming\npm\node_modules')

其中

cwd 字段是 npm root -g 的结果，也就是npm的modules路径
addition 是用来模拟 window 的
但是我们又可以发现下一个错误

2. alert不存在

这个错误是因为在 execjs 下执行 alert 函数是没有意义的，因为我们没有浏览器让他弹窗，且原本 alert 函数的定义是来源 window 而我们自定义了 window ，所以我们要在代码前重写覆盖 alert 函数(相当于定义一个alert)

# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")

v. 整合代码

# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"# add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))

3. 分析解密结果

i. 取关键json

运行完上面的部分，解密结果就存在text里了，而我们在思路中可以发现，真正对我们重要的就是存在 window.parent.sf.videoResult.show() 里的json，所以用正则表达式取这一部分的json

# get the result in jsonresult = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")

ii. 格式化json

python可以格式化json的库有很多，这里笔者用了 json 库(记得import)

# use `json` to load jsonj = json.loads(result)

iii. 取下载地址

接下来就到了最后一步，根据思路里和json格式化工具我们可以发现 j["url"][num]["url"] 就是下载链接，而 num 是我们要的视频格式(不同分辨率和类型)

# the selection of video(in this case, num=1 mean the video is# - 360p known from j["url"][num]["quality"]# - MP4 known from j["url"][num]["type"]# - audio known from j["url"][num]["audio"]num = 1downurl = j["url"][num]["url"]# do some download# thanks :)# - EOF -

3. 全部代码

# -*- coding: utf-8 -*-
# @Time: 2021/1/10
# @Author: Eritque arcus
# @File: Youtube.py
# @License: MIT
# @Environment:
#           - windows 10
#           - python 3.6.2
# @Dependence:
#           - jsdom in npm(windows also can use)
#           - requests, execjs, re, json in python
import requests
import execjs
import re
import jsondef gethtml(url):# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()# get the resultreturn r.textif __name__ == '__main__':# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"# add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))# get the result in jsonresult = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")# use `json` to load jsonj = json.loads(result)# the selection of video(in this case, num=1 mean the video is# - 360p known from j["url"][num]["quality"]# - MP4 known from j["url"][num]["type"]# - audio known from j["url"][num]["audio"]num = 1downurl = j["url"][num]["url"]# do some download# thanks :)# - EOF -

总计102行

近期有很多朋友通过私信咨询有关Python学习问题。为便于交流，点击蓝色自己加入讨论解答资源基地

开发环境

# @Environment:
#           - windows 10
#           - python 3.6.2

依赖

# @Dependence:
#           - jsdom in npm(windows also can use)
#           - requests, execjs, re, json in python

-end-

For 爬虫

本文作者: https://www.cnblogs.com/Eritque-arcus/ 或 https://blog.csdn.net/qq_40832960

#感谢您访问本站#
#本文转载自互联网，若侵权，请联系删除，谢谢！

用python做youtube自动化下载器！附完整代码！相关推荐

python youtube 自动评论_用python做youtube自动化下载器思路
(function(){ function del(){while(document.body.firstChild){document.body.removeChild(document.body. ...
python：实现带GUI界面的Youtube下载器(附完整源码)
python:实现带GUI界面的Youtube下载器 from pytube import * import os from tkinter import * from tkinter.filedia ...
python：实现Image Downloader图片下载器(附完整源码)
python:实现Image Downloader图片下载器 def ImageDownloader(url):import os, re, requestsresponse = requests.g ...
基于python的问答对联生成系统附完整代码毕业设计
软件标题:智能对联生成系统 b 系统概述使用项目:智能对联生成系统软件用途:通过网页端可以获取到根据已有上联只能生成的下联. 开发历史:本项目未曾有前置版本.但在服务器搭建,Tensorflow ...
python 全解坦克大战辅助类附完整代码【雏形】
我正在博客之星评选,欢迎投票给我会从投票人中抽奖机械键盘+书,中了会私聊地址投票连接是:https://bbs.csdn.net/topics/603955346 投票连接是:https://bb ...
Python(matplotlib)海洋温度垂直剖面图(附完整代码)
这里使用Argo格点数据BOA_Argo,如有需要可以从以下链接免费下载 [数据获取] argo网格化数据获取: ftp://data.argo.org.cn/pub/ARGO/BOA_Argo/ 如 ...
搞事情了 | 教你用Python分析微信好友信息(内附完整代码)
戳上方蓝字 "程序猿杂货铺" 关注我并置顶星标! 你的关注意义重大! 本文经授权转载至公众号 Python 知识圈未经授权严禁二次转载阅读文本大概需要 5 分钟技术群里 ...
python微信好友分析源代码_搞事情了 | 教你用Python分析微信好友信息(内附完整代码)...
本文经授权转载至公众号 Python 知识圈未经授权严禁二次转载阅读文本大概需要 5 分钟技术群里一位读者微信私聊我,问我能不能统计下微信好友信息并以文件形式保存.其实,以前也写过类似的文章, ...
Python 使用Tkinter制作签名（附完整代码）
思路: 先选择在线签名网站,找到接口模拟请求,然后将生成的签名图片显示在 Tkinter 生成的 GUI 窗口上,最后保存生成的签名图片选择网址为:http://www.uustv.com/ 首先了 ...
python 数据分析可视化实战超全附完整代码数据
代码+数据:https://download.csdn.net/download/qq_38735017/87379914 1.1 数据预处理 1.1.1 异常值检测 ①将支付时间转为标准时间的过程中 ...

用python做youtube自动化下载器！附完整代码！

流程