vtt字幕转srt，python代码，vtt字幕格式转换srt

最近下载了一些字幕都是vtt格式的，很多视频软件打不开，于是找了转换的代码，主体大部分是在github上面找的，添加了一个函数，做了一些简单修改。

使用方法：

1，新建一个文本文件，并重命名（后缀一起）为vtt2srt.py,你也可以自己命名为其他的，然后把代码粘贴进去。

2，把vtt2srt.py复制到你需要转换的vtt的文件夹下

3，打开使用shift+鼠标右键，在文件夹下打开powershell或者cmd，执行下面命令

python vtt2srt.py -i ./ -o ./

-i 后面是跟的输入文件夹，会自动搜索文件夹下的vtt文件，如果是当前文件夹用./即可

-o 后面是跟的输出文件夹，在当前文件夹输出使用./即可

两个参数都是可以省略的

下面是代码内容：

import argparse
import codecs
import math
import os
import reSUPPORTED_EXTENSIONS = [".xml", ".vtt"]def leading_zeros(value, digits=2):value = "000000" + str(value)return value[-digits:]def convert_time(raw_time):if int(raw_time) == 0:return "{}:{}:{},{}".format(0, 0, 0, 0)ms = '000'if len(raw_time) > 4:ms = leading_zeros(int(raw_time[:-4]) % 1000, 3)time_in_seconds = int(raw_time[:-7]) if len(raw_time) > 7 else 0second = leading_zeros(time_in_seconds % 60)minute = leading_zeros(int(math.floor(time_in_seconds / 60)) % 60)hour = leading_zeros(int(math.floor(time_in_seconds / 3600)))return "{}:{}:{},{}".format(hour, minute, second, ms)def xml_id_display_align_before(text):"""displayAlign="before" means the current sub will be displayed on top.That is and not at bottom. We check what's the xml:id associated to itto have an {\an8} position tag in the output file."""align_before_re = re.compile(u'<region.*tts:displayAlign=\"before\".*xml:id=\"(.*)\"/>')has_align_before = re.search(align_before_re, text)if has_align_before:return has_align_before.group(1)return u""def xml_get_cursive_style_ids(text):style_section = re.search("<styling>(.*)</styling>", text, flags=re.DOTALL)if not style_section:return []style_ids_re = re.compile('<style.* tts:fontStyle="italic".* xml:id=\"([a-zA-Z0-9_.]+)\"')return [re.search(style_ids_re, line).groups()[0]for line in style_section.group().split("\n")if re.search(style_ids_re, line)]def xml_cleanup_spans_start(span_id_re, cursive_ids, text):has_cursive = []span_start_tags = re.findall(span_id_re, text)for s in span_start_tags:has_cursive.append(u"<i>" if s[1] in cursive_ids else u"")text = has_cursive[-1].join(text.split(s[0], 1))return text, has_cursivedef xml_cleanup_spans_end(span_end_re, text, has_cursive):span_end_tags = re.findall(span_end_re, text)for s, cursive in zip(span_end_tags, has_cursive):cursive = u"</i>" if cursive else u""text = cursive.join(text.split(s, 1))return textdef to_srt(text, extension):if extension.lower() == ".xml":return xml_to_srt(text)if extension.lower() == ".vtt":return vtt_to_srt(text)def format_text(line):line = line.replace("&lrm;","",1)while line.find("<")!=-1:indexa = line.find("<")indexb = line.find(">")line = line.replace(line[indexa:indexb+1],"")return linedef convert_vtt_time(line):times = line.replace(".", ",").split(" --> ")if len(times[0]) == 9:times = ["00:" + t for t in times]return "{} --> {}".format(times[0], times[1].split(" ")[0])def vtt_to_srt(text):if not text.startswith(u"\ufeffWEBVTT") and not text.startswith(u"WEBVTT"):raise Exception(".vtt format must start with WEBVTT, wrong file?")lines = []current_sub_line = []for line in text.split("\n"):if current_sub_line:line = format_text(line)current_sub_line.append(line)if not line or line=="\r":lines.append("\n".join(current_sub_line) + "\n")current_sub_line = []elif " --> " in line:current_sub_line = [convert_vtt_time(line)]if current_sub_line:lines.append("\n".join(current_sub_line))return "".join((u"{}\n{}".format(i, l) for i, l in enumerate(lines, 1)))def xml_to_srt(text):def append_subs(start, end, prev_content, format_time):subs.append({"start_time": convert_time(start) if format_time else start,"end_time": convert_time(end) if format_time else end,"content": u"\n".join(prev_content),})display_align_before = xml_id_display_align_before(text)begin_re = re.compile(u"\s*<p begin=")sub_lines = (l for l in text.split("\n") if re.search(begin_re, l))subs = []prev_time = {"start": 0, "end": 0}prev_content = []start = end = ''start_re = re.compile(u'begin\="([0-9:\.]*)')end_re = re.compile(u'end\="([0-9:\.]*)')content_re = re.compile(u'\">(.*)</p>')# some span tags are used for italics, we'll replace them by <i> and </i>,# which is the standard for .srt files. We ignore all other uses.cursive_ids = xml_get_cursive_style_ids(text)span_id_re = re.compile(u'(<span style=\"([a-zA-Z0-9_.]+)\">)+')span_end_re = re.compile(u'(</span>)+')br_re = re.compile(u'(<br\s*\/?>)+')fmt_t = Truefor s in sub_lines:s, has_cursive = xml_cleanup_spans_start(span_id_re, cursive_ids, s)string_region_re = r'<p(.*region="' + display_align_before + r'".*")>(.*)</p>'s = re.sub(string_region_re, r'<p\1>{\\an8}\2</p>', s)content = re.search(content_re, s).group(1)br_tags = re.search(br_re, content)if br_tags:content = u"\n".join(content.split(br_tags.group()))content = xml_cleanup_spans_end(span_end_re, content, has_cursive)prev_start = prev_time["start"]start = re.search(start_re, s).group(1)end = re.search(end_re, s).group(1)if len(start.split(":")) > 1:fmt_t = Falsestart = start.replace(".", ",")end = end.replace(".", ",")if (prev_start == start and prev_time["end"] == end) or not prev_start:# Fix for multiple lines starting at the same timeprev_time = {"start": start, "end": end}prev_content.append(content)continueappend_subs(prev_time["start"], prev_time["end"], prev_content, fmt_t)prev_time = {"start": start, "end": end}prev_content = [content]append_subs(start, end, prev_content, fmt_t)lines = (u"{}\n{} --> {}\n{}\n".format(s + 1, subs[s]["start_time"], subs[s]["end_time"], subs[s]["content"])for s in range(len(subs)))return u"\n".join(lines)def main():directory = "."help_text = u"path to the {} directory (defaults to current directory)"parser = argparse.ArgumentParser()parser.add_argument("-i", "--input", type=str, default=directory,help=help_text.format("input", directory))parser.add_argument("-o", "--output", type=str, default=directory,help=help_text.format("output", directory))a = parser.parse_args()filenames = [fn for fn in os.listdir(a.input)if fn[-4:].lower() in SUPPORTED_EXTENSIONS]for fn in filenames:with codecs.open("{}/{}".format(a.input, fn), 'rb', "utf-8") as f:text = f.read()with codecs.open("{}/{}.srt".format(a.output, fn[:-4]), 'wb', "utf-8") as f:f.write(to_srt(text, fn[-4:]))if __name__ == '__main__':main()

vtt字幕转srt，python代码，vtt字幕格式转换srt相关推荐

在线ico图标制作、python代码实现ico格式转换
在制作网站图标favicon和PC端软件图标需要用到ico格式,这里讲解下如何将png.jpg等格式转换成ico格式一.使用在线工具转换ico 可以在这个免费的工具网站:https://www.1t ...
python代码实现ASCII码转换
☞☞☞点击查看更多优秀Python博客☜☜☜ ASCII码转换 python代码实现ASCII码转换 ==**文章导航:==** python代码实现ASCII码转换自己刚开始学习python不久, ...
【图片格式转换】python实现批量图片格式转换：emf、jpeg等转为png
[图片格式转换]python批量进行图片格式转换emf.jpeg等转为png 文章目录 [图片格式转换]python批量进行图片格式转换emf.jpeg等转为png 1. 代码 2. 效果 3. 总结 ...
利用Python脚本实现批量格式转换（视频转音频）
利用Python脚本实现批量格式转换(视频转音频) 利用ffmpeg工具和Python,实现批量视频/音频格式转换.本例只给出视频转音频(.wav)方法,更多格式转换可百度ffmpeg用法,修改代码中 ...
python youtube字幕_用Python将单个Webvtt格式字幕转成Srt格式字幕
从Youtube中下载的字幕是Webvtt格式,我用的射手影音播放器无法正确加载该字幕,所以用Python写了一个脚本将vtt格式的字幕转化为srt格式的字幕.我所使用Python程序编写平台是PSF ...
python代码实现进制转换
进制之间的转换用代码实现 # 进制转换 # 方法一:win + r 输入calc打开程序员计算器 # 方法二:通过代码实现# 其他进制转换为十进制 # 1:二进制-->十进制 print(&qu ...
Python轻松实现PDF格式转换(附详细源码)
公众号后台回复"图书",了解更多号主新书内容作者:J哥来源:菜J学Python 项目背景网上PDF转换工具眼花缭乱,选择困难症,有些甚至还收费: 直接以其他格式打开PDF效果 ...
Python脚本—批量图片格式转换
使用Python的os.sys.PIL库完成批量图片格式转换 import os import sys from PIL import Imageinput_folder = r'D:\Toolkit ...
Python常用日期时间格式转换总结
日期格式转换不外乎就是日期格式-字符串格式-数值格式之间的相互转换,以及相同格式的不同形式间的转换,最常用的是datetime模块,下面直接举例子说明如何进行转换. 1. 日期时间格式转字符首先获取 ...

vtt字幕转srt，python代码，vtt字幕格式转换srt

vtt字幕转srt，python代码，vtt字幕格式转换srt相关推荐

最新文章

热门文章