用python做youtube自动化下载器 代码

  • 项目地址
  • 思路
  • 流程
    • 1. post
      • i. 先把post中的headers格式化
      • ii.然后把参数也格式化
      • iii. 最后再执行`requests`库的post请求
      • iv. 封装成一个函数
    • 2. 调用解密函数
    • i. 分析
    • ii. 先取出js部分
    • iii. 取第一个解密函数作为我们用的解密函数
    • iv. 用execjs执行
      • 1. this也就是window变量不存在
      • 2. alert不存在
    • v. 整合代码
    • 3. 分析解密结果
      • i. 取关键json
      • ii. 格式化json
      • iii. 取下载地址
  • 3. 全部代码

根据 savefrom条例
本实例及教程只用于学习交流用,权利归savefrom.net所有
最后代码+注释大概100行左右,具体代码以github代码为主(可以会在上面修复bug),本文只做具体讲解

项目地址

github仓库

思路

用python做youtube自动化下载器 思路

流程

1. post

根据思路里的第一步,我们首先需要用post方式取到加密后的js字段,笔者使用了requests第三方库来执行,关于爬虫可以参考我之前的文章

i. 先把post中的headers格式化

# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}

其中cookie部分可能要改,然后最好以你们浏览器上的为主,具体每个参数的含义不是本文范围,可以自行去搜索引擎搜

ii.然后把参数也格式化

# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}

其中sf_url字段是我们要下载的youtube视频的url,其他参数都不变

iii. 最后再执行requests库的post请求

# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()

注意是data=kv

iv. 封装成一个函数

import requestsdef gethtml(url):# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()# get the resultreturn r.text

2. 调用解密函数

i. 分析

这其中的难点在于在python里执行javascript代码,而晚上的解决方法有PyV8等,本文选用execjs。在思路部分我们可以发现js部分的最后几行是解密函数,所以我们只需要在execjs中先执行一遍全部,然后再单独执行解密函数就好了

ii. 先取出js部分

# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]

这里其实可以用正则,不过由于笔者正则表达式还不太熟练就直接用split

iii. 取第一个解密函数作为我们用的解密函数

当你多取几次不同视频的结果,你就会发现每次的解密函数都不一样,不过位置都是还是在固定行数

# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"

所以name就是我们的解密函数了(变量名没取太好hhh)

iv. 用execjs执行

# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(reo)# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))

其中只取=后面的和去掉分号是指指执行这个函数而不用赋值,当先执行赋值+解密然后取值也不是不可以
但是我们可以发现马上就报错了(要是有这么简单就好了)

1. this也就是window变量不存在

如果没记错是报错this或者$b,笔者尝试把全部this去掉或者把全部框在一个class里面(这样子this就变成那个class了)不过都没有成功,然后发现在npm下有个jsdom可以在execjs里模拟window变量(其实应该有更好方法的),所以我们需要下载npm和里面的jsdom,然后改写以上代码

    addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\xxx\AppData\Roaming\npm\node_modules')

其中

  • cwd字段是npm root -g的结果,也就是npm的modules路径
  • addition是用来模拟window
    但是我们又可以发现下一个错误

2. alert不存在

这个错误是因为在execjs下执行alert函数是没有意义的,因为我们没有浏览器让他弹窗,且原本alert函数的定义是来源window而我们自定义了window,所以我们要在代码前重写覆盖alert函数(相当于定义一个alert)

# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")

v. 整合代码

# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"# add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))

3. 分析解密结果

i. 取关键json

运行完上面的部分,解密结果就存在text里了,而我们在思路中可以发现,真正对我们重要的就是存在window.parent.sf.videoResult.show()里的json,所以用正则表达式取这一部分的json

# get the result in jsonresult = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")

ii. 格式化json

python可以格式化json的库有很多,这里笔者用了json库(记得import)

# use `json` to load jsonj = json.loads(result)

iii. 取下载地址

接下来就到了最后一步,根据思路里和json格式化工具我们可以发现j["url"][num]["url"]就是下载链接,而num是我们要的视频格式(不同分辨率和类型)

# the selection of video(in this case, num=1 mean the video is# - 360p known from j["url"][num]["quality"]# - MP4 known from j["url"][num]["type"]# - audio known from j["url"][num]["audio"]num = 1downurl = j["url"][num]["url"]# do some download# thanks :)# - EOF -

3. 全部代码

# -*- coding: utf-8 -*-
# @Time: 2021/1/10
# @Author: Eritque arcus
# @File: Youtube.py
# @License: MIT
# @Environment:
#           - windows 10
#           - python 3.6.2
# @Dependence:
#           - jsdom in npm(windows also can use)
#           - requests, execjs, re, json in python
import requests
import execjs
import re
import jsondef gethtml(url):# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()# get the resultreturn r.textif __name__ == '__main__':# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"# add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))# get the result in jsonresult = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")# use `json` to load jsonj = json.loads(result)# the selection of video(in this case, num=1 mean the video is# - 360p known from j["url"][num]["quality"]# - MP4 known from j["url"][num]["type"]# - audio known from j["url"][num]["audio"]num = 1downurl = j["url"][num]["url"]# do some download# thanks :)# - EOF -
  • 总计102行
  • 开发环境
# @Environment:
#           - windows 10
#           - python 3.6.2
  • 依赖
# @Dependence:
#           - jsdom in npm(windows also can use)
#           - requests, execjs, re, json in python
-end-

用python做youtube自动化下载器 代码相关推荐

  1. python youtube 自动评论_用python做youtube自动化下载器 思路

    (function(){ function del(){while(document.body.firstChild){document.body.removeChild(document.body. ...

  2. 一起用Python做个自动化短视频生成脚本,实现热门视频流水线生产!

    前言 前几天有粉丝和我说,最近在网上看到一些视频营销号一天能发布几百条短视频, 感觉都是批量生成的,能不能用Python做个自动化短视频生成脚本呢? 今天就带大家一起用Python做个自动化视频生成脚 ...

  3. 用python爬虫制作图片下载器(超有趣!)

    这几天小菌给大家分享的大部分都是关于大数据,linux方面的"干货".有粉丝私聊小菌,希望能分享一些有趣的爬虫小程序.O(∩_∩)O哈哈,是时候露一手了.今天给大家分享的是一个适合 ...

  4. Python3爬虫——用selenium获取歌曲id,做一个音乐下载器

    我们之前已经学习了selenium的简单实用,现在就来实战下,我们通过selenium获取歌曲的id,然后通过网易云音乐的外链地址来下载音乐,做一个音乐下载器(此项目仅供教学使用),下面我们先来看一下 ...

  5. python批量下载文件只有1kb_详解如何用python实现一个简单下载器的服务端和客户端...

    话不多说,先看代码: 客户端: import socket def main(): #creat: download_client=socket.socket(socket.AF_INET,socke ...

  6. python代码示例图形-纯干货:手把手教你用Python做数据可视化(附代码)

    原标题:纯干货:手把手教你用Python做数据可视化(附代码) 导读:制作提供信息的可视化(有时称为绘图)是数据分析中的最重要任务之一.可视化可能是探索过程的一部分,例如,帮助识别异常值或所需的数据转 ...

  7. python多进程断点续传分片下载器

    python多进程断点续传分片下载器 标签:python 下载器 多进程 因为爬虫要用到下载器,但是直接用urllib下载很慢,所以找了很久终于找到一个让我欣喜的下载器.他能够断点续传分片下载,极大提 ...

  8. python画图代码大全-纯干货:手把手教你用Python做数据可视化(附代码)

    原标题:纯干货:手把手教你用Python做数据可视化(附代码) 导读:制作提供信息的可视化(有时称为绘图)是数据分析中的最重要任务之一.可视化可能是探索过程的一部分,例如,帮助识别异常值或所需的数据转 ...

  9. python做接口自动化测试仪器经销商_Python接口自动化测试的实现

    接口测试的方式有很多,比如可以用工具(jmeter,postman)之类,也可以自己写代码进行接口测试,工具的使用相对来说都比较简单,重点是要搞清楚项目接口的协议是什么,然后有针对性的进行选择,甚至当 ...

  10. 卧槽!我用Python做一个打字测试器!看看谁是最快的男人!

    对于平时经常使用电脑的小伙伴而言,一个必不可少的操作就是利用键盘进行打字的操作,想必大家对自己的打字速度也是非常的自信,但是具体的速度大家却不能够准确表述. 今天,小编就同大家利用python制作一款 ...

最新文章

  1. PCA(主成分分析)+SVD(奇异值分解)+区别
  2. spring cloud (三) 路由 zuul
  3. [翻译] JTCalendar
  4. paip sms to blog.txt
  5. oom linux 导致ssh,Linux OOM
  6. ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (10061)解决办法
  7. Spring: @Import @ImportResource引入资源
  8. echarts年龄饼图_解决echarts饼图显示百分比,和显示内容字体及大小
  9. 读《我是一只 IT 小小鸟》
  10. dell计算机维修教程,戴尔Dell Latitude E6410/E6510官方拆机图解维修手册
  11. 文档翻译免费工具(网页版)PDF翻译,word翻译
  12. SaaS公司到底算不算互联网公司?
  13. *皮亚诺关于公理4的一段语录解析 皮亚诺读后之六
  14. netcat 使用方法
  15. 超好用的Redis管理及监控工具
  16. 4月22日丨【云数据库技术沙龙】技术进化,让数据更智能
  17. JDK、JER、JVM三者间的联系与区别
  18. 浅谈PING指令的使用
  19. oracle ebs 采购员表,oracle EBS采购订单各表作用分析
  20. 中国矿业大学算法概论作业一 D、沙子的质量

热门文章

  1. iReport报表Detail设置自适应高度
  2. SQL中计算字符串的长度函数
  3. English--动词时态
  4. WIN7护眼颜色设置
  5. ppt模板网站哪个好
  6. win10鼠标停留任务栏不显示预览小窗口
  7. 阿里linux内核月报2014-07-08
  8. 推荐一款前端轻量级的toolTip插件-Tippy.js
  9. 小心哟!你很可能被“杏仁体”劫持了
  10. android怎么硬解锁,手机怎么强制解锁