文章目录：

1 环境的搭建
2 如何使用Real-Time-Voice-Cloning
- 2.1 下载预训练的模型
- 2.2 先测试环境是否可用（optional）
- 2.3 下载数据集（optional）
- 2.4 运行ToolBox
3 常见错误
- 3.1 错误1：`OSError: PortAudio library not found`
4 训练其他语言的数据集

1 环境的搭建

我安装版本：

在虚拟环境tf1中
tensorflow-gpu==1.15.0
torch==1.4.0

2 如何使用Real-Time-Voice-Cloning

2.1 下载预训练的模型

1、官方给出的预训练模型下载连接

谷歌网盘下载
DropBox下载网盘地址

2、下载打包好的预训练模型pretrained.zip模型，直接解压即可！

unzip pretrained.zip

解压后的文件会自动放置到如下的路径下：

encoder\saved_models\pretrained.pt
synthesizer\saved_models\logs-pretrained\taco_pretrained\checkpoint
synthesizer\saved_models\logs-pretrained\taco_pretrained\tacotron_model.ckpt-278000.data-00000-of-00001
synthesizer\saved_models\logs-pretrained\taco_pretrained\tacotron_model.ckpt-278000.index
synthesizer\saved_models\logs-pretrained\taco_pretrained\tacotron_model.ckpt-278000.meta
vocoder\saved_models\pretrained\pretrained.pt

2.2 先测试环境是否可用（optional）

你可以先用如下的命令进行测试：

python demo_cli.py

如果测试通过，没有报错则表示环境没有问题，当然该步骤是可选的，你也可以不测试！

2.3 下载数据集（optional）

对于仅使用工具箱的情况，仅仅建议下载LibriSpeech / train-clean-100。将内容提取为<datasets_root> / LibriSpeech / train-clean-100，其中<datasets_root>是您选择的目录。 toolbox中支持其他数据集，请参见此处。你可以自由地不下载任何数据集，但是您将需要自己的数据作为音频文件，或者必须使用工具箱记录下来。

LibriSpeech / train-clean-100：https://www.openslr.org/resources/12/train-clean-100.tar.gz

2.4 运行ToolBox

运行toolbox，如果你已经下载了数据集，可以用如下的命令

python demo_toolbox.py -d <datasets_root>

如果你没有下载数据集，直接运行

python demo_toolbox.py

3 常见错误

3.1 错误1：`OSError: PortAudio library not found`

1、错误：在运行python demo_toolbox.py的时候程序直接报错：OSError: PortAudio library not found

(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$ python demo_toolbox.py -h
/home/shl/shl_res/1_project/Real-Time-Voice-Cloning/encoder/audio.py:13: UserWarning: Unable to import 'webrtcvad'. This package enables noise removal and is recommended.warn("Unable to import 'webrtcvad'. This package enables noise removal and is recommended.")
Traceback (most recent call last):File "demo_toolbox.py", line 2, in <module>from toolbox import ToolboxFile "/home/shl/shl_res/1_project/Real-Time-Voice-Cloning/toolbox/__init__.py", line 1, in <module>from toolbox.ui import UIFile "/home/shl/shl_res/1_project/Real-Time-Voice-Cloning/toolbox/ui.py", line 10, in <module>import sounddevice as sdFile "/home/shl/anaconda3/envs/tf1/lib/python3.6/site-packages/sounddevice.py", line 71, in <module>raise OSError('PortAudio library not found')
OSError: PortAudio library not found
(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$

2、解决方式（参考）：

sudo apt-get install libportaudio2

(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$ sudo apt-get install libportaudio2
[sudo] password for shl:
正在读取软件包列表... 完成
正在分析软件包的依赖关系树
正在读取状态信息... 完成
下列【新】软件包将被安装：libportaudio2
升级了 0 个软件包，新安装了 1 个软件包，要卸载 0 个软件包，有 394 个软件包未被升级。
需要下载 64.6 kB 的归档。
解压缩后会消耗 215 kB 的额外空间。
获取:1 http://mirrors.aliyun.com/ubuntu bionic/universe amd64 libportaudio2 amd64 19.6.0-1 [64.6 kB]
已下载 64.6 kB，耗时 0秒 (341 kB/s)
正在选中未选择的软件包 libportaudio2:amd64。
(正在读取数据库 ... 系统当前共安装有 356909 个文件和目录。)
正准备解包 .../libportaudio2_19.6.0-1_amd64.deb  ...
正在解包 libportaudio2:amd64 (19.6.0-1) ...
正在设置 libportaudio2:amd64 (19.6.0-1) ...
正在处理用于 libc-bin (2.27-3ubuntu1.3) 的触发器 ...
/sbin/ldconfig.real: /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7 is not a symbolic link(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$

3、再次运行错误解决

(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$ python demo_toolbox.py -h
/home/shl/shl_res/1_project/Real-Time-Voice-Cloning/encoder/audio.py:13: UserWarning: Unable to import 'webrtcvad'. This package enables noise removal and is recommended.warn("Unable to import 'webrtcvad'. This package enables noise removal and is recommended.")
/home/shl/anaconda3/envs/tf1/lib/python3.6/site-packages/umap/__init__.py:9: UserWarning: Tensorflow not installed; ParametricUMAP will be unavailablewarn("Tensorflow not installed; ParametricUMAP will be unavailable")
usage: demo_toolbox.py [-h] [-d DATASETS_ROOT] [-e ENC_MODELS_DIR][-s SYN_MODELS_DIR] [-v VOC_MODELS_DIR] [--low_mem][--seed SEED] [--no_mp3_support]Runs the toolboxoptional arguments:-h, --help            show this help message and exit-d DATASETS_ROOT, --datasets_root DATASETS_ROOTPath to the directory containing your datasets. Seetoolbox/__init__.py for a list of supported datasets.(default: None)-e ENC_MODELS_DIR, --enc_models_dir ENC_MODELS_DIRDirectory containing saved encoder models (default:encoder/saved_models)-s SYN_MODELS_DIR, --syn_models_dir SYN_MODELS_DIRDirectory containing saved synthesizer models(default: synthesizer/saved_models)-v VOC_MODELS_DIR, --voc_models_dir VOC_MODELS_DIRDirectory containing saved vocoder models (default:vocoder/saved_models)--low_mem             If True, the memory used by the synthesizer will befreed after each use. Adds large overhead but allowsto save some GPU memory for lower-end GPUs. (default:False)--seed SEED           Optional random number seed value to make toolboxdeterministic. (default: None)--no_mp3_support      If True, no mp3 files are allowed. (default: False)
(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$

4 训练其他语言的数据集

目前，作者提供的是只支持英语，其他语言的训练应该也是支持的，可以看这个issues，但是貌似对数据集的要求很高，要求：

你必须有一个300小时指定有语言的语音数据集
而且语音的质量必须好

目前我没有尝试过这部分！

参考1：https://zhuanlan.zhihu.com/p/72589678
参考2：https://blog.csdn.net/Lucas23/article/details/107765779
参考3：https://juejin.cn/post/6872260516966842382
参考4：https://zhuanlan.zhihu.com/p/112627134 # 对原理的介绍
参考5：https://github.com/KuangDD/zhrtvc # 对中文的实现
参考6：https://www.youtube.com/watch?reload=9&v=-O_hYhToKoA # YouTube教程

欢迎大家关注笔者，你的关注是我持续更博的最大动力

原创文章，转载告知，盗版必究
微信：suihailiang0816 QQ：931762054 wx公众号：仰望星空的小随

Real-Time-Voice-Cloning的使用教程相关推荐

CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit 数据理解
0. 说明 (README) 感谢作者~ 此CSTR VCTK语料库包含109英文说出的语音数据具有各种口音的扬声器. 每个扬声器读出约400 从报纸上选出的句子,彩虹段落还有一个用于语音口音档案 ...
【论文学习】《Practical Attacks on Voice Spoofing Countermeasures》
<Practical Attacks on Voice Spoofing Countermeasures>论文学习文章目录 <Practical Attacks on Voice ...
【论文学习笔记】《An Overview of Voice Conversion and Its Challenges》
<An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning&g ...
【飞桨PaddleSpeech语音技术课程】— 一句话语音合成全流程实践
(以下内容搬运自飞桨PaddleSpeech语音技术课程,点击链接可直接运行源码) 一句话语音合成全流程实践点击播放视频 1 声音克隆介绍 & 语音合成基本概念回顾语音合成(Speech ...
现在你可以通过深度学习用别人的声音来说话了
语音合成(Text-to-speech,TTS)是指文本到音频的人工转换,也可以说给定一段文字去生成对应的人类读音.人类通过阅读来完成这项任务,而一个好的TTS系统是让计算机自动完成这项任务. 在打造 ...
AI绘画火了！一文看懂背后技术原理
导语 | 近些年AI蓬勃发展,在各行各业都有着不同方式的应用.而AI创作艺术和生产内容无疑是今年以来最热门的话题,AI创作到底发生过什么,原理又是如何,是噱头还是会有对我们有用的潜在应用场景呢?我们旨 ...
【论文学习】《A Survey on Neural Speech Synthesis》
<A Survey on Neural Speech Synthesis>论文学习文章目录 <A Survey on Neural Speech Synthesis>论文学习 ...
有手就会，一键启动，在线运行体验！这个“声音复刻”的技术居然开源啦！...
随着以语音为交互渠道的产业不断升级,企业对于语音合成也有着越来越多的需求,比如智能语音助手,手机地图导航,有声书播报等场景都需要用到语音合成技术.通过语音合成技术想要得到一个新的音色,需要定制音库,但 ...
《飞桨PaddleSpeech语音技术课程》一句话语音合成全流程实践
一句话语音合成全流程实践 PaddleSpeech r1.2.0 发新内容 1 声音克隆介绍 & 语音合成基本概念回顾语音合成(Speech Sysnthesis),又称文本转语音(Text ...
ai声音模仿_该AI只需聆听5秒钟即可克隆您的声音
ai声音模仿 This post is about some fairly recent improvements in the field of AI-based voice cloning. If ...

Real-Time-Voice-Cloning的使用教程

文章目录：

1 环境的搭建

2 如何使用Real-Time-Voice-Cloning

2.1 下载预训练的模型

2.2 先测试环境是否可用（optional）

2.3 下载数据集（optional）

2.4 运行ToolBox

3 常见错误

3.1 错误1：`OSError: PortAudio library not found`

4 训练其他语言的数据集

Real-Time-Voice-Cloning的使用教程相关推荐

最新文章

热门文章

Real-Time-Voice-Cloning的使用教程

文章目录：

1 环境的搭建

2 如何使用Real-Time-Voice-Cloning

2.1 下载预训练的模型

2.2 先测试环境是否可用（optional）

2.3 下载数据集（optional）

2.4 运行ToolBox

3 常见错误

3.1 错误1：OSError: PortAudio library not found

4 训练其他语言的数据集

Real-Time-Voice-Cloning的使用教程相关推荐

最新文章

热门文章

3.1 错误1：`OSError: PortAudio library not found`