流程

1. post

根据思路里的第一步,我们首先需要用 post 方式取到加密后的js字段,笔者使用了 requests第三方库来执行,关于爬虫可以参考 我之前的文章

i. 先把post中的headers格式化

# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}

其中 cookie 部分可能要改,然后最好以你们浏览器上的为主,具体每个参数的含义不是本文范围,可以自行去搜索引擎搜

ii.然后把参数也格式化

# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}

其中 sf_url 字段是我们要下载的youtube视频的url,其他参数都不变

iii. 最后再执行 requests 库的post请求

# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()

注意是 data=kv

iv. 封装成一个函数

import requestsdef gethtml(url):# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()# get the resultreturn r.text

2. 调用解密函数

i. 分析

这其中的难点在于在python里执行javascript代码,而晚上的解决方法有 PyV8 等,本文选用 execjs 。在思路部分我们可以发现js部分的最后几行是解密函数,所以我们只需要在 execjs 中先执行一遍全部,然后再单独执行解密函数就好了

ii. 先取出js部分

# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]

这里其实可以用正则,不过由于笔者正则表达式还不太熟练就直接用 split 了

iii. 取第一个解密函数作为我们用的解密函数

当你多取几次不同视频的结果,你就会发现每次的解密函数都不一样,不过位置都是还是在固定行数

# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"

所以 name 就是我们的解密函数了(变量名没取太好hhh)

iv. 用execjs执行

# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(reo)# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))

其中只取 = 后面的和去掉分号是指指执行这个函数而不用赋值,当先执行赋值+解密然后取值也不是不可以

但是我们可以发现马上就报错了(要是有这么简单就好了)

1. this也就是window变量不存在

如果没记错是报错 this 或者 $b ,笔者尝试把全部 this 去掉或者把全部框在一个 class 里面(这样子this就变成那个class了)不过都没有成功,然后发现在 npm 下有个 jsdom 可以在 execjs 里模拟window变量(其实应该有更好方法的),所以我们需要下载 npm 和里面的 jsdom ,然后改写以上代码

addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\xxx\AppData\Roaming\npm\node_modules')

其中

  • cwd 字段是 npm root -g 的结果,也就是npm的modules路径
  • addition 是用来模拟 window 的
    但是我们又可以发现下一个错误

2. alert不存在

这个错误是因为在 execjs 下执行 alert 函数是没有意义的,因为我们没有浏览器让他弹窗,且原本 alert 函数的定义是来源 window 而我们自定义了 window ,所以我们要在代码前重写覆盖 alert 函数(相当于定义一个alert)

# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")

v. 整合代码

# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"# add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))

3. 分析解密结果

i. 取关键json

运行完上面的部分,解密结果就存在text里了,而我们在思路中可以发现,真正对我们重要的就是存在 window.parent.sf.videoResult.show() 里的json,所以用正则表达式取这一部分的json

# get the result in jsonresult = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")

ii. 格式化json

python可以格式化json的库有很多,这里笔者用了 json 库(记得import)

# use `json` to load jsonj = json.loads(result)

iii. 取下载地址

接下来就到了最后一步,根据思路里和json格式化工具我们可以发现 j["url"][num]["url"] 就是下载链接,而 num 是我们要的视频格式(不同分辨率和类型)

# the selection of video(in this case, num=1 mean the video is# - 360p known from j["url"][num]["quality"]# - MP4 known from j["url"][num]["type"]# - audio known from j["url"][num]["audio"]num = 1downurl = j["url"][num]["url"]# do some download# thanks :)# - EOF -

3. 全部代码

# -*- coding: utf-8 -*-
# @Time: 2021/1/10
# @Author: Eritque arcus
# @File: Youtube.py
# @License: MIT
# @Environment:
#           - windows 10
#           - python 3.6.2
# @Dependence:
#           - jsdom in npm(windows also can use)
#           - requests, execjs, re, json in python
import requests
import execjs
import re
import jsondef gethtml(url):# set the headers or the website will not return information# the cookies in here you may need to changeheaders = {"cache-Control": "no-cache","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,""*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9,en;q=0.8","content-type": "application/x-www-form-urlencoded","cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; ""clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; ""helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; ""_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; ""PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; ""PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1","origin": "https://en.savefrom.net","pragma": "no-cache","referer": "https://en.savefrom.net/1-youtube-video-downloader-4/","sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"","sec-ch-ua-mobile": "?0","sec-fetch-dest": "iframe","sec-fetch-mode": "navigate","sec-fetch-site": "same-origin","sec-fetch-user": "?1","upgrade-insecure-requests": "1","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/87.0.4280.88 Safari/537.36"}# set the parameter, we can get from chromekv = {"sf_url": url,"sf_submit": "","new": "1","lang": "en","app": "","country": "cn","os": "Windows","browser": "Chrome"}# do the POST requestr = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,data=kv)r.raise_for_status()# get the resultreturn r.textif __name__ == '__main__':# target(youtube address) urlurl = "https://www.youtube.com/watch?v=YPvtz1lHRiw"# get the target textreo = gethtml(url)# Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]# override the alert function, because in the code there has one place using# and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a errorreo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")# split each line(help us find the decrypt function in last few line)reA = reo.split("\n")# get the depcrypt functionname = reA[len(reA) - 3].split(";")[0] + ";"# add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)addition = """const jsdom = require("jsdom");const { JSDOM } = jsdom;const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);window = dom.window;document = window.document;XMLHttpRequest = window.XMLHttpRequest;"""# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')# do the decryptiontext = ct.eval(name.split("=")[1].replace(";", ""))# get the result in jsonresult = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")# use `json` to load jsonj = json.loads(result)# the selection of video(in this case, num=1 mean the video is# - 360p known from j["url"][num]["quality"]# - MP4 known from j["url"][num]["type"]# - audio known from j["url"][num]["audio"]num = 1downurl = j["url"][num]["url"]# do some download# thanks :)# - EOF -
  • 总计102行

近期有很多朋友通过私信咨询有关Python学习问题。为便于交流,点击蓝色自己加入讨论解答资源基地

  • 开发环境
# @Environment:
#           - windows 10
#           - python 3.6.2
  • 依赖
# @Dependence:
#           - jsdom in npm(windows also can use)
#           - requests, execjs, re, json in python

-end-

For 爬虫

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。

本文作者: https://www.cnblogs.com/Eritque-arcus/ 或 https://blog.csdn.net/qq_40832960

#感谢您访问本站#
#本文转载自互联网,若侵权,请联系删除,谢谢!

用python做youtube自动化下载器!附完整代码!相关推荐

  1. python youtube 自动评论_用python做youtube自动化下载器 思路

    (function(){ function del(){while(document.body.firstChild){document.body.removeChild(document.body. ...

  2. python:实现带GUI界面的Youtube下载器(附完整源码)

    python:实现带GUI界面的Youtube下载器 from pytube import * import os from tkinter import * from tkinter.filedia ...

  3. python:实现Image Downloader图片下载器(附完整源码)

    python:实现Image Downloader图片下载器 def ImageDownloader(url):import os, re, requestsresponse = requests.g ...

  4. 基于python的问答对联生成系统 附完整代码 毕业设计

    软件标题:智能对联生成系统 b 系统概述 使用项目:智能对联生成系统 软件用途:通过网页端可以获取到根据已有上联只能生成的下联. 开发历史:本项目未曾有前置版本.但在服务器搭建,Tensorflow ...

  5. python 全解坦克大战 辅助类 附完整代码【雏形】

    我正在博客之星评选,欢迎投票给我 会从投票人中抽奖机械键盘+书,中了会私聊地址 投票连接是:https://bbs.csdn.net/topics/603955346 投票连接是:https://bb ...

  6. Python(matplotlib)海洋温度垂直剖面图(附完整代码)

    这里使用Argo格点数据BOA_Argo,如有需要可以从以下链接免费下载 [数据获取] argo网格化数据获取: ftp://data.argo.org.cn/pub/ARGO/BOA_Argo/ 如 ...

  7. 搞事情了 | 教你用Python分析微信好友信息(内附完整代码)

    戳上方蓝字 "程序猿杂货铺" 关注我 并 置顶星标! 你的关注意义重大! 本文经授权转载至公众号 Python 知识圈 未经授权 严禁二次转载 阅读文本大概需要 5 分钟 技术群里 ...

  8. python微信好友分析源代码_搞事情了 | 教你用Python分析微信好友信息(内附完整代码)...

    本文经授权转载至公众号 Python 知识圈 未经授权 严禁二次转载 阅读文本大概需要 5 分钟 技术群里一位读者微信私聊我,问我能不能统计下微信好友信息并以文件形式保存.其实,以前也写过类似的文章, ...

  9. Python 使用Tkinter制作签名(附完整代码)

    思路: 先选择在线签名网站,找到接口模拟请求,然后将生成的签名图片显示在 Tkinter 生成的 GUI 窗口上,最后保存生成的签名图片 选择网址为:http://www.uustv.com/ 首先了 ...

  10. python 数据分析可视化实战 超全 附完整代码数据

    代码+数据:https://download.csdn.net/download/qq_38735017/87379914 1.1 数据预处理 1.1.1 异常值检测 ①将支付时间转为标准时间的过程中 ...

最新文章

  1. 开源 免费 java CMS - FreeCMS-标签 channelList .
  2. 百度地图海量点清除(始终保留最新的点)
  3. 公积金买房有什么好处?
  4. 见与不见 ---仓央嘉措
  5. 最大池化层和平均池化层图解
  6. matlab安装好 启动总是闪退_在Ubuntu16.04下安装MATLAB2017b
  7. python里else中文意思_Python循环语句中else的用法总结
  8. 李彦宏:百度吹过的牛逼今天实现了!
  9. python动态爱心曲线_使用matplotlib动态刷新指定曲线实例
  10. 拜托,别再让我优化大事务了,我的头都要裂开了
  11. 项目关键路径与项目最长路径有可能不同
  12. Modbus 调试工具: Modbus poll与Modbus slave下载与使用(下)
  13. 基于 HPSocket , 实现 socket 通讯
  14. html svg在线画板,很棒的SVG图形(多边形)在线生成器
  15. 各层电子数排布规则_电子数的排布规律是什么?
  16. 面向对象之---this的用法
  17. 【Redis-6.0.8】Redis中的RAX
  18. 如何把图片的文字转换成word
  19. Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation
  20. java-微信语言amr文件转码为Mp3文件

热门文章

  1. Excel如何按照单元格背景颜色排序
  2. Windows多媒体开发框架介绍
  3. sslpinning实战
  4. 红外遥控接收发射原理及ESP8266实现
  5. 个人sublime定制
  6. Ubuntu下安装网易有道词典
  7. 37 《这才是心理学》 -豆瓣评分9.0
  8. LaTex - PPT 模板-2 (亲测可用)
  9. android+自定义跑马灯,Android自定义图文跑马灯效果
  10. RAID磁盘阵列详解