vtt字幕转srt,python代码,vtt字幕格式转换srt
最近下载了一些字幕都是vtt格式的,很多视频软件打不开,于是找了转换的代码,主体大部分是在github上面找的,添加了一个函数,做了一些简单修改。
使用方法:
1,新建一个文本文件,并重命名(后缀一起)为vtt2srt.py,你也可以自己命名为其他的,然后把代码粘贴进去。
2,把vtt2srt.py复制到你需要转换的vtt的文件夹下
3,打开使用shift+鼠标右键,在文件夹下打开powershell或者cmd,执行下面命令
python vtt2srt.py -i ./ -o ./
-i 后面是跟的输入文件夹,会自动搜索文件夹下的vtt文件,如果是当前文件夹用./即可
-o 后面是跟的输出文件夹,在当前文件夹输出使用./即可
两个参数都是可以省略的
下面是代码内容:
import argparse
import codecs
import math
import os
import reSUPPORTED_EXTENSIONS = [".xml", ".vtt"]def leading_zeros(value, digits=2):value = "000000" + str(value)return value[-digits:]def convert_time(raw_time):if int(raw_time) == 0:return "{}:{}:{},{}".format(0, 0, 0, 0)ms = '000'if len(raw_time) > 4:ms = leading_zeros(int(raw_time[:-4]) % 1000, 3)time_in_seconds = int(raw_time[:-7]) if len(raw_time) > 7 else 0second = leading_zeros(time_in_seconds % 60)minute = leading_zeros(int(math.floor(time_in_seconds / 60)) % 60)hour = leading_zeros(int(math.floor(time_in_seconds / 3600)))return "{}:{}:{},{}".format(hour, minute, second, ms)def xml_id_display_align_before(text):"""displayAlign="before" means the current sub will be displayed on top.That is and not at bottom. We check what's the xml:id associated to itto have an {\an8} position tag in the output file."""align_before_re = re.compile(u'<region.*tts:displayAlign=\"before\".*xml:id=\"(.*)\"/>')has_align_before = re.search(align_before_re, text)if has_align_before:return has_align_before.group(1)return u""def xml_get_cursive_style_ids(text):style_section = re.search("<styling>(.*)</styling>", text, flags=re.DOTALL)if not style_section:return []style_ids_re = re.compile('<style.* tts:fontStyle="italic".* xml:id=\"([a-zA-Z0-9_.]+)\"')return [re.search(style_ids_re, line).groups()[0]for line in style_section.group().split("\n")if re.search(style_ids_re, line)]def xml_cleanup_spans_start(span_id_re, cursive_ids, text):has_cursive = []span_start_tags = re.findall(span_id_re, text)for s in span_start_tags:has_cursive.append(u"<i>" if s[1] in cursive_ids else u"")text = has_cursive[-1].join(text.split(s[0], 1))return text, has_cursivedef xml_cleanup_spans_end(span_end_re, text, has_cursive):span_end_tags = re.findall(span_end_re, text)for s, cursive in zip(span_end_tags, has_cursive):cursive = u"</i>" if cursive else u""text = cursive.join(text.split(s, 1))return textdef to_srt(text, extension):if extension.lower() == ".xml":return xml_to_srt(text)if extension.lower() == ".vtt":return vtt_to_srt(text)def format_text(line):line = line.replace("‎","",1)while line.find("<")!=-1:indexa = line.find("<")indexb = line.find(">")line = line.replace(line[indexa:indexb+1],"")return linedef convert_vtt_time(line):times = line.replace(".", ",").split(" --> ")if len(times[0]) == 9:times = ["00:" + t for t in times]return "{} --> {}".format(times[0], times[1].split(" ")[0])def vtt_to_srt(text):if not text.startswith(u"\ufeffWEBVTT") and not text.startswith(u"WEBVTT"):raise Exception(".vtt format must start with WEBVTT, wrong file?")lines = []current_sub_line = []for line in text.split("\n"):if current_sub_line:line = format_text(line)current_sub_line.append(line)if not line or line=="\r":lines.append("\n".join(current_sub_line) + "\n")current_sub_line = []elif " --> " in line:current_sub_line = [convert_vtt_time(line)]if current_sub_line:lines.append("\n".join(current_sub_line))return "".join((u"{}\n{}".format(i, l) for i, l in enumerate(lines, 1)))def xml_to_srt(text):def append_subs(start, end, prev_content, format_time):subs.append({"start_time": convert_time(start) if format_time else start,"end_time": convert_time(end) if format_time else end,"content": u"\n".join(prev_content),})display_align_before = xml_id_display_align_before(text)begin_re = re.compile(u"\s*<p begin=")sub_lines = (l for l in text.split("\n") if re.search(begin_re, l))subs = []prev_time = {"start": 0, "end": 0}prev_content = []start = end = ''start_re = re.compile(u'begin\="([0-9:\.]*)')end_re = re.compile(u'end\="([0-9:\.]*)')content_re = re.compile(u'\">(.*)</p>')# some span tags are used for italics, we'll replace them by <i> and </i>,# which is the standard for .srt files. We ignore all other uses.cursive_ids = xml_get_cursive_style_ids(text)span_id_re = re.compile(u'(<span style=\"([a-zA-Z0-9_.]+)\">)+')span_end_re = re.compile(u'(</span>)+')br_re = re.compile(u'(<br\s*\/?>)+')fmt_t = Truefor s in sub_lines:s, has_cursive = xml_cleanup_spans_start(span_id_re, cursive_ids, s)string_region_re = r'<p(.*region="' + display_align_before + r'".*")>(.*)</p>'s = re.sub(string_region_re, r'<p\1>{\\an8}\2</p>', s)content = re.search(content_re, s).group(1)br_tags = re.search(br_re, content)if br_tags:content = u"\n".join(content.split(br_tags.group()))content = xml_cleanup_spans_end(span_end_re, content, has_cursive)prev_start = prev_time["start"]start = re.search(start_re, s).group(1)end = re.search(end_re, s).group(1)if len(start.split(":")) > 1:fmt_t = Falsestart = start.replace(".", ",")end = end.replace(".", ",")if (prev_start == start and prev_time["end"] == end) or not prev_start:# Fix for multiple lines starting at the same timeprev_time = {"start": start, "end": end}prev_content.append(content)continueappend_subs(prev_time["start"], prev_time["end"], prev_content, fmt_t)prev_time = {"start": start, "end": end}prev_content = [content]append_subs(start, end, prev_content, fmt_t)lines = (u"{}\n{} --> {}\n{}\n".format(s + 1, subs[s]["start_time"], subs[s]["end_time"], subs[s]["content"])for s in range(len(subs)))return u"\n".join(lines)def main():directory = "."help_text = u"path to the {} directory (defaults to current directory)"parser = argparse.ArgumentParser()parser.add_argument("-i", "--input", type=str, default=directory,help=help_text.format("input", directory))parser.add_argument("-o", "--output", type=str, default=directory,help=help_text.format("output", directory))a = parser.parse_args()filenames = [fn for fn in os.listdir(a.input)if fn[-4:].lower() in SUPPORTED_EXTENSIONS]for fn in filenames:with codecs.open("{}/{}".format(a.input, fn), 'rb', "utf-8") as f:text = f.read()with codecs.open("{}/{}.srt".format(a.output, fn[:-4]), 'wb', "utf-8") as f:f.write(to_srt(text, fn[-4:]))if __name__ == '__main__':main()
vtt字幕转srt,python代码,vtt字幕格式转换srt相关推荐
- 在线ico图标制作、python代码实现ico格式转换
在制作网站图标favicon和PC端软件图标需要用到ico格式,这里讲解下如何将png.jpg等格式转换成ico格式 一.使用在线工具转换ico 可以在这个免费的工具网站:https://www.1t ...
- python代码实现ASCII码转换
☞☞☞点击查看更多优秀Python博客☜☜☜ ASCII码转换 python代码实现ASCII码转换 ==**文章导航:==** python代码实现ASCII码转换 自己刚开始学习python不久, ...
- 【图片格式转换】python实现批量图片格式转换:emf、jpeg等转为png
[图片格式转换]python批量进行图片格式转换emf.jpeg等转为png 文章目录 [图片格式转换]python批量进行图片格式转换emf.jpeg等转为png 1. 代码 2. 效果 3. 总结 ...
- 利用Python脚本实现批量格式转换(视频转音频)
利用Python脚本实现批量格式转换(视频转音频) 利用ffmpeg工具和Python,实现批量视频/音频格式转换.本例只给出视频转音频(.wav)方法,更多格式转换可百度ffmpeg用法,修改代码中 ...
- python youtube字幕_用Python将单个Webvtt格式字幕转成Srt格式字幕
从Youtube中下载的字幕是Webvtt格式,我用的射手影音播放器无法正确加载该字幕,所以用Python写了一个脚本将vtt格式的字幕转化为srt格式的字幕.我所使用Python程序编写平台是PSF ...
- python代码实现进制转换
进制之间的转换用代码实现 # 进制转换 # 方法一:win + r 输入calc打开程序员计算器 # 方法二:通过代码实现# 其他进制转换为十进制 # 1:二进制-->十进制 print(&qu ...
- Python轻松实现PDF格式转换(附详细源码)
公众号后台回复"图书",了解更多号主新书内容 作者:J哥 来源:菜J学Python 项目背景 网上PDF转换工具眼花缭乱,选择困难症,有些甚至还收费: 直接以其他格式打开PDF效果 ...
- Python脚本—批量图片格式转换
使用Python的os.sys.PIL库完成批量图片格式转换 import os import sys from PIL import Imageinput_folder = r'D:\Toolkit ...
- Python常用日期时间格式转换总结
日期格式转换不外乎就是日期格式-字符串格式-数值格式之间的相互转换,以及相同格式的不同形式间的转换,最常用的是datetime模块,下面直接举例子说明如何进行转换. 1. 日期时间格式转字符 首先获取 ...
最新文章
- leetcode28. Implement strStr() (以及个人对KMP算法理解)
- iPhone清理喇叭灰尘_手机喇叭孔灰尘清理
- 错误:Subquery returns more than 1 row 表示子查询返回了多行数据
- RocketMQ快速入门之消息过滤器(用户自定义属性)
- Bzoj5093: 图的价值
- thinkphp 助手函数url不生成https_thinkphp5.0 URL 地址生成
- LeetCode——二叉树序列化与反序列化
- 盛金公式解一元三次方程_【国际数学竞赛】高次方程求根
- 新安搭信息快车建智慧城市
- opencv训练样本分类器
- Antenna Placement(二分图的最大匹配)
- 【图像配准】基于灰度的模板匹配算法(一):MAD、SAD、SSD、MSD、NCC、SSDA、SATD算法...
- bzoj 5281: [Usaco2018 Open]Talent Show【dp】
- (17)DialogBox和DialogBoxParam函数
- 小妲己智能机器人要连接wifi吗_ZIB智伴机器人可以连WiFi吗 ZIB智伴机器人连接WiFi方法...
- 软件工程与计算II-24-考试总结
- 【渝粤教育】电大中专药物化学基础 (2)作业 题库
- 6.5编程实例-立方体透视投影
- AI生死劫,什么样的公司将被洪流吞噬?
- Unsafe code may only appear if compiling with /unsafe. Enable “Allow ‘unsafe‘ code“