用python做youtube自动化下载器代码

项目地址
思路
流程
- 1. post
- - i. 先把post中的headers格式化
  - ii.然后把参数也格式化
  - iii. 最后再执行`requests`库的post请求
  - iv. 封装成一个函数
- 2. 调用解密函数
- i. 分析
- ii. 先取出js部分
- iii. 取第一个解密函数作为我们用的解密函数
- iv. 用execjs执行
- - 1. this也就是window变量不存在
  - 2. alert不存在
- v. 整合代码
- 3. 分析解密结果
- - i. 取关键json
  - ii. 格式化json
  - iii. 取下载地址
3. 全部代码

根据 savefrom条例
本实例及教程只用于学习交流用，权利归savefrom.net所有
最后代码+注释大概100行左右，具体代码以github代码为主(可以会在上面修复bug)，本文只做具体讲解

项目地址

github仓库

思路

用python做youtube自动化下载器思路

流程

1. post

根据思路里的第一步，我们首先需要用post方式取到加密后的js字段，笔者使用了requests第三方库来执行，关于爬虫可以参考我之前的文章

i. 先把post中的headers格式化

# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}

其中cookie部分可能要改，然后最好以你们浏览器上的为主，具体每个参数的含义不是本文范围，可以自行去搜索引擎搜

ii.然后把参数也格式化

# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}

其中sf_url字段是我们要下载的youtube视频的url，其他参数都不变

iii. 最后再执行`requests`库的post请求

# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()

注意是data=kv

iv. 封装成一个函数

import requestsdef gethtml(url):# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()# get the resultreturn r.text

2. 调用解密函数

i. 分析

这其中的难点在于在python里执行javascript代码，而晚上的解决方法有PyV8等，本文选用execjs。在思路部分我们可以发现js部分的最后几行是解密函数，所以我们只需要在execjs中先执行一遍全部，然后再单独执行解密函数就好了

ii. 先取出js部分

# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]

这里其实可以用正则，不过由于笔者正则表达式还不太熟练就直接用split了

iii. 取第一个解密函数作为我们用的解密函数

当你多取几次不同视频的结果，你就会发现每次的解密函数都不一样，不过位置都是还是在固定行数

# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"

所以name就是我们的解密函数了(变量名没取太好hhh)

iv. 用execjs执行

# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(reo)# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))

其中只取=后面的和去掉分号是指指执行这个函数而不用赋值，当先执行赋值+解密然后取值也不是不可以
但是我们可以发现马上就报错了(要是有这么简单就好了)

1. this也就是window变量不存在

如果没记错是报错this或者$b，笔者尝试把全部this去掉或者把全部框在一个class里面(这样子this就变成那个class了)不过都没有成功，然后发现在npm下有个jsdom可以在execjs里模拟window变量(其实应该有更好方法的)，所以我们需要下载npm和里面的jsdom，然后改写以上代码

    addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\xxx\AppData\Roaming\npm\node_modules')

其中

cwd字段是npm root -g的结果，也就是npm的modules路径
addition是用来模拟window的
但是我们又可以发现下一个错误

2. alert不存在

这个错误是因为在execjs下执行alert函数是没有意义的，因为我们没有浏览器让他弹窗，且原本alert函数的定义是来源window而我们自定义了window，所以我们要在代码前重写覆盖alert函数(相当于定义一个alert)

# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")

v. 整合代码

# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"# add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))

3. 分析解密结果

i. 取关键json

运行完上面的部分，解密结果就存在text里了，而我们在思路中可以发现，真正对我们重要的就是存在window.parent.sf.videoResult.show()里的json，所以用正则表达式取这一部分的json

# get the result in jsonresult = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")

ii. 格式化json

python可以格式化json的库有很多，这里笔者用了json库(记得import)

# use `json` to load jsonj = json.loads(result)

iii. 取下载地址

接下来就到了最后一步，根据思路里和json格式化工具我们可以发现j["url"][num]["url"]就是下载链接，而num是我们要的视频格式(不同分辨率和类型)

# the selection of video(in this case, num=1 mean the video is# - 360p known from j["url"][num]["quality"]# - MP4 known from j["url"][num]["type"]# - audio known from j["url"][num]["audio"]num = 1downurl = j["url"][num]["url"]# do some download# thanks :)# - EOF -

3. 全部代码

# -*- coding: utf-8 -*-
# @Time: 2021/1/10
# @Author: Eritque arcus
# @File: Youtube.py
# @License: MIT
# @Environment:
#           - windows 10
#           - python 3.6.2
# @Dependence:
#           - jsdom in npm(windows also can use)
#           - requests, execjs, re, json in python
import requests
import execjs
import re
import jsondef gethtml(url):# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()# get the resultreturn r.textif __name__ == '__main__':# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"# add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))# get the result in jsonresult = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")# use `json` to load jsonj = json.loads(result)# the selection of video(in this case, num=1 mean the video is# - 360p known from j["url"][num]["quality"]# - MP4 known from j["url"][num]["type"]# - audio known from j["url"][num]["audio"]num = 1downurl = j["url"][num]["url"]# do some download# thanks :)# - EOF -

总计102行
开发环境

# @Environment:
#           - windows 10
#           - python 3.6.2

依赖

# @Dependence:
#           - jsdom in npm(windows also can use)
#           - requests, execjs, re, json in python

-end-

用python做youtube自动化下载器代码相关推荐

python youtube 自动评论_用python做youtube自动化下载器思路
(function(){ function del(){while(document.body.firstChild){document.body.removeChild(document.body. ...
一起用Python做个自动化短视频生成脚本，实现热门视频流水线生产！
前言前几天有粉丝和我说,最近在网上看到一些视频营销号一天能发布几百条短视频, 感觉都是批量生成的,能不能用Python做个自动化短视频生成脚本呢? 今天就带大家一起用Python做个自动化视频生成脚 ...
用python爬虫制作图片下载器(超有趣!)
这几天小菌给大家分享的大部分都是关于大数据,linux方面的"干货".有粉丝私聊小菌,希望能分享一些有趣的爬虫小程序.O(∩_∩)O哈哈,是时候露一手了.今天给大家分享的是一个适合 ...
Python3爬虫——用selenium获取歌曲id，做一个音乐下载器
我们之前已经学习了selenium的简单实用,现在就来实战下,我们通过selenium获取歌曲的id,然后通过网易云音乐的外链地址来下载音乐,做一个音乐下载器(此项目仅供教学使用),下面我们先来看一下 ...
python批量下载文件只有1kb_详解如何用python实现一个简单下载器的服务端和客户端...
话不多说,先看代码: 客户端: import socket def main(): #creat: download_client=socket.socket(socket.AF_INET,socke ...
python代码示例图形-纯干货：手把手教你用Python做数据可视化（附代码）
原标题:纯干货:手把手教你用Python做数据可视化(附代码) 导读:制作提供信息的可视化(有时称为绘图)是数据分析中的最重要任务之一.可视化可能是探索过程的一部分,例如,帮助识别异常值或所需的数据转 ...
python多进程断点续传分片下载器
python多进程断点续传分片下载器标签:python 下载器多进程因为爬虫要用到下载器,但是直接用urllib下载很慢,所以找了很久终于找到一个让我欣喜的下载器.他能够断点续传分片下载,极大提 ...
python画图代码大全-纯干货：手把手教你用Python做数据可视化（附代码）
原标题:纯干货:手把手教你用Python做数据可视化(附代码) 导读:制作提供信息的可视化(有时称为绘图)是数据分析中的最重要任务之一.可视化可能是探索过程的一部分,例如,帮助识别异常值或所需的数据转 ...
python做接口自动化测试仪器经销商_Python接口自动化测试的实现
接口测试的方式有很多,比如可以用工具(jmeter,postman)之类,也可以自己写代码进行接口测试,工具的使用相对来说都比较简单,重点是要搞清楚项目接口的协议是什么,然后有针对性的进行选择,甚至当 ...
卧槽！我用Python做一个打字测试器！看看谁是最快的男人！
对于平时经常使用电脑的小伙伴而言,一个必不可少的操作就是利用键盘进行打字的操作,想必大家对自己的打字速度也是非常的自信,但是具体的速度大家却不能够准确表述. 今天,小编就同大家利用python制作一款 ...

用python做youtube自动化下载器代码

用python做youtube自动化下载器代码

项目地址

思路

流程

1. post

i. 先把post中的headers格式化

ii.然后把参数也格式化

iii. 最后再执行`requests`库的post请求

iv. 封装成一个函数

2. 调用解密函数

i. 分析

ii. 先取出js部分

iii. 取第一个解密函数作为我们用的解密函数

iv. 用execjs执行

1. this也就是window变量不存在

2. alert不存在

v. 整合代码

3. 分析解密结果

i. 取关键json

ii. 格式化json

iii. 取下载地址

3. 全部代码

用python做youtube自动化下载器代码相关推荐

最新文章

热门文章

用python做youtube自动化下载器 代码

用python做youtube自动化下载器 代码

项目地址

思路

流程

1. post

i. 先把post中的headers格式化

ii.然后把参数也格式化

iii. 最后再执行requests库的post请求

iv. 封装成一个函数

2. 调用解密函数

i. 分析

ii. 先取出js部分

iii. 取第一个解密函数作为我们用的解密函数

iv. 用execjs执行

1. this也就是window变量不存在

2. alert不存在

v. 整合代码

3. 分析解密结果

i. 取关键json

ii. 格式化json

iii. 取下载地址

3. 全部代码

用python做youtube自动化下载器 代码相关推荐

最新文章

热门文章

用python做youtube自动化下载器代码

用python做youtube自动化下载器代码

iii. 最后再执行`requests`库的post请求

用python做youtube自动化下载器代码相关推荐