一、使用背景

公司项目之前一直是采用人工录音，然而上线一段时间之后发现，人工录音成本太高，而且每周上线的音频不多，老板发现问题后，甚至把音频功能裸停了一段时间。直到最近项目要向海外扩展，需要内容做国际化，就想到了用机器翻译。目前机翻已经相对成熟，做的好的国内有科大讯飞，国外有微软。既然项目主要面对海外用户，就决定采用微软的TTS。（PS：这里不是打广告，微软的TTS是真的不错，自己可以去官网试听下，虽然无法像人一样很有感情的朗读诗歌什么的，但是朗读新闻咨询类文章还是抑扬顿挫的。）

二、上代码

使用背景已经啰嗦了一大堆，我觉得读者还是会关注的，但是我想作为资深CV码农，我想你们更关注还是如何应用，所以还是老规矩，简简单单的上代码。（申请账号这些就不介绍了）

1.依赖

<dependency><groupId>com.microsoft.cognitiveservices.speech</groupId><artifactId>client-sdk</artifactId><version>1.12.1</version>
</dependency>

2.配置常量

public class TtsConst {/*** 音频合成类型（亲测这种效果最佳，其他的你自己去试试）*/public static final String AUDIO_24KHZ_48KBITRATE_MONO_MP3 = "audio-24khz-48kbitrate-mono-mp3";/*** 授权url*/public static final String ACCESS_TOKEN_URI = "https://eastasia.api.cognitive.microsoft.com/sts/v1.0/issuetoken";/*** api key*/public static final String API_KEY = "你自己的 api key";/*** 设置accessToken的过期时间为9分钟*/public static final Integer ACCESS_TOKEN_EXPIRE_TIME = 9 * 60;/*** 性别*/public static final String MALE = "Male";/*** tts服务url*/public static final String TTS_SERVICE_URI = "https://eastasia.tts.speech.microsoft.com/cognitiveservices/v1";}

3.https连接

public class HttpsConnection {public static HttpsURLConnection getHttpsConnection(String connectingUrl) throws Exception {URL url = new URL(connectingUrl);return (HttpsURLConnection) url.openConnection();}
}

3.授权

@Component
@Slf4j
public class Authentication {@Resourceprivate RedisCache redisCache;public String genAccessToken() {InputStream inSt;HttpsURLConnection webRequest;try {String accessToken = redisCache.get(RedisKey.KEY_TTS_ACCESS_TOKEN);if (StringUtils.isEmpty(accessToken)) {webRequest = HttpsConnection.getHttpsConnection(TtsConst.ACCESS_TOKEN_URI);webRequest.setDoInput(true);webRequest.setDoOutput(true);webRequest.setConnectTimeout(5000);webRequest.setReadTimeout(5000);webRequest.setRequestMethod("POST");byte[] bytes = new byte[0];webRequest.setRequestProperty("content-length", String.valueOf(bytes.length));webRequest.setRequestProperty("Ocp-Apim-Subscription-Key", TtsConst.API_KEY);webRequest.connect();DataOutputStream dop = new DataOutputStream(webRequest.getOutputStream());dop.write(bytes);dop.flush();dop.close();inSt = webRequest.getInputStream();InputStreamReader in = new InputStreamReader(inSt);BufferedReader bufferedReader = new BufferedReader(in);StringBuilder strBuffer = new StringBuilder();String line = null;while ((line = bufferedReader.readLine()) != null) {strBuffer.append(line);}bufferedReader.close();in.close();inSt.close();webRequest.disconnect();accessToken = strBuffer.toString();//设置accessToken的过期时间为9分钟redisCache.set(RedisKey.KEY_TTS_ACCESS_TOKEN, accessToken, TtsConst.ACCESS_TOKEN_EXPIRE_TIME);log.info("New tts access token {}", accessToken);}return accessToken;} catch (Exception e) {log.error("Generate tts access token failed {}", e.getMessage());}return null;}
}

4.字节数组处理

public class ByteArray {private byte[] data;private int length;public ByteArray(){length = 0;data = new byte[length];}public ByteArray(byte[] ba){data = ba;length = ba.length;}/**合并数组*/public  void cat(byte[] second, int offset, int length){if(this.length + length > data.length) {int allocatedLength = Math.max(data.length, length);byte[] allocated = new byte[allocatedLength << 1];System.arraycopy(data, 0, allocated, 0, this.length);System.arraycopy(second, offset, allocated, this.length, length);data = allocated;}else {System.arraycopy(second, offset, data, this.length, length);}this.length += length;}public  void cat(byte[] second){cat(second, 0, second.length);}public byte[] getArray(){if(length == data.length){return data;}byte[] ba = new byte[length];System.arraycopy(data, 0, ba, 0, this.length);data = ba;return ba;}public int getLength(){return length;}
}

5.创建SSML文件

@Slf4j
public class XmlDom {public static String createDom(String locale, String genderName, String voiceName, String textToSynthesize){Document doc = null;Element speak, voice;try {DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();DocumentBuilder builder = dbf.newDocumentBuilder();doc = builder.newDocument();if (doc != null){speak = doc.createElement("speak");speak.setAttribute("version", "1.0");speak.setAttribute("xml:lang", "en-us");voice = doc.createElement("voice");voice.setAttribute("xml:lang", locale);voice.setAttribute("xml:gender", genderName);voice.setAttribute("name", voiceName);voice.appendChild(doc.createTextNode(textToSynthesize));speak.appendChild(voice);doc.appendChild(speak);}} catch (ParserConfigurationException e) {log.error("Create ssml document failed: {}",e.getMessage());return null;}return transformDom(doc);}private static String transformDom(Document doc){StringWriter writer = new StringWriter();try {TransformerFactory tf = TransformerFactory.newInstance();Transformer transformer;transformer = tf.newTransformer();transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");transformer.transform(new DOMSource(doc), new StreamResult(writer));} catch (TransformerException e) {log.error("Transform ssml document failed: {}",e.getMessage());return null;}return writer.getBuffer().toString().replaceAll("\n|\r", "");}
}

6.正主来了！TTS服务

@Slf4j
@Component
public class TtsService {@Resourceprivate Authentication authentication;/*** 合成音频*/public byte[] genAudioBytes(String textToSynthesize, String locale, String gender, String voiceName) {String accessToken = authentication.genAccessToken();if (StringUtils.isEmpty(accessToken)) {return new byte[0];}try {HttpsURLConnection webRequest = HttpsConnection.getHttpsConnection(TtsConst.TTS_SERVICE_URI);webRequest.setDoInput(true);webRequest.setDoOutput(true);webRequest.setConnectTimeout(5000);webRequest.setReadTimeout(300000);webRequest.setRequestMethod("POST");webRequest.setRequestProperty("Content-Type", "application/ssml+xml");webRequest.setRequestProperty("X-Microsoft-OutputFormat", TtsConst.AUDIO_24KHZ_48KBITRATE_MONO_MP3);webRequest.setRequestProperty("Authorization", "Bearer " + accessToken);webRequest.setRequestProperty("X-Search-AppId", "07D3234E49CE426DAA29772419F436CC");webRequest.setRequestProperty("X-Search-ClientID", "1ECFAE91408841A480F00935DC390962");webRequest.setRequestProperty("User-Agent", "TTSAndroid");webRequest.setRequestProperty("Accept", "*/*");String body = XmlDom.createDom(locale, gender, voiceName, textToSynthesize);if (StringUtils.isEmpty(body)) {return new byte[0];}byte[] bytes = body.getBytes();webRequest.setRequestProperty("content-length", String.valueOf(bytes.length));webRequest.connect();DataOutputStream dop = new DataOutputStream(webRequest.getOutputStream());dop.write(bytes);dop.flush();dop.close();InputStream inSt = webRequest.getInputStream();ByteArray ba = new ByteArray();int rn2 = 0;int bufferLength = 4096;byte[] buf2 = new byte[bufferLength];while ((rn2 = inSt.read(buf2, 0, bufferLength)) > 0) {ba.cat(buf2, 0, rn2);}inSt.close();webRequest.disconnect();return ba.getArray();} catch (Exception e) {log.error("Synthesis tts speech failed {}", e.getMessage());}return null;}
}

由于项目中需要将音频上传到OSS，所以这里生成的是字节码文件，你也可以选择下载或保存音频文件。

三、问题及总结

1.问题

由于项目中需要生成超过10分钟的音频，我在调试中发现tts不能生成超过10分钟的音频，尴尬了呀，在微软官网中摸索了半天也没找到生成超过10分钟音频的办法，放弃了吗？不可能的。在我感觉到无计可施的时候，我的脑海中蹦出了四个字，那就是”断点续传“。我就想能不能通过tts把内容分段生成字节码两个，然后拼接后再上传到OSS。说干就干，没想到最后真的可行。成功那一瞬间的感觉无法言喻呀。不废话了，嗯，上大妈，哦不是，上代码。太激动了。

    /*** 生成中文音频信息*/public byte[] getZHAudioBuffer(String gender, String chapterContent, String locale, String voiceName) {byte[] audioBuffer;if (chapterContent.length() <= 2600) {audioBuffer = ttsService.genAudioBytes(chapterContent, locale, gender, voiceName);} else {byte[] audioBuffer1 = ttsService.genAudioBytes(chapterContent.substring(0, chapterContent.length() / 2), locale, gender, voiceName);byte[] audioBuffer2 = ttsService.genAudioBytes(chapterContent.substring(chapterContent.length() / 2), locale, gender, voiceName);ByteArray byteArray = new ByteArray(audioBuffer1);byteArray.cat(audioBuffer2);audioBuffer = byteArray.getArray();}return audioBuffer;}/*** 生成英文音频信息*/public byte[] getUSAudioBuffer(String gender, String chapterContent, String locale, String voiceName) {String[] words = chapterContent.split(" ");byte[] audioBuffer;int maxLength = 1500;if (words.length <= maxLength) {audioBuffer = ttsService.genAudioBytes(chapterContent, locale, gender, voiceName);} else {String[] part1 = new String[maxLength];String[] part2 = new String[words.length - maxLength];for (int i = 0; i < words.length; i++) {if (i < maxLength) {part1[i] = words[i];} else {part2[i - maxLength] = words[i];}}byte[] audioBuffer1 = ttsService.genAudioBytes(String.join(" ", part1), locale, gender, voiceName);byte[] audioBuffer2 = ttsService.genAudioBytes(String.join(" ", part2), locale, gender, voiceName);ByteArray byteArray = new ByteArray(audioBuffer1);byteArray.cat(audioBuffer2);audioBuffer = byteArray.getArray();}return audioBuffer;}

我要说的都在代码里了，你细品。（PS：中文的2600字符和英文的1500字符，是我调试出来的，生成的音频肯定是在10分钟以内的）

2.总结

微软TTS还是挺香的，嗯，总结很到位，我继续摸索其他功能去了。

java实现微软文本转语音（TTS）经验总结相关推荐

springboot微软文本转语音（texttospeach） java实现微软文本转语音
java实现微软文本转语音(TTS)经验总结官网地址: https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech-servic ...
开源(离线)中文文本转语音TTS(语音合成)工具整理
开源(离线)中文文本转语音TTS(语音合成)工具整理目录文章目录目录 PaddleSpeech VoiceVox TensorFlowTTS ttskit OpenTTS eSpeak 微软 T ...
【API解析】微软文本转语音(text-to-speech)官方Demo调用步骤
[API解析]微软文本转语音(text-to-speech)官方Demo调用步骤 1. 来源 github: MsEdgeTTS 吾爱破解:微软语音助手免费版,支持多种功能,全网首发微软Demo: ...
微软文本转语音实测记录附php/go调用源码
接口地址地址:http://www.mysqlschool.cn/SpeekText/index.php 提交方式:post/get 推荐post 例子:http://www.mysqlschool ...
java调用espeak_espeak-example Java for windows文本转语音，用引擎 Other systems 其他 244万源代码下载- www.pudn.com...
文件名称: espeak-example下载收藏√ [ 5 4 3 2 1 ] 开发工具: Java 文件大小: 1286 KB 上传时间: 2015-10-04 下载次数: 0 提供 ...
qt文本转语音tts的使用方法，QTextToSpeech
这个功能已经被qt封装好了,在不同的操作系上封装了不同的方法. 在win7上,qt调用的是微软讲述者(microsoft speech),这个功能在原版win中是自带的,在ghost或者阉割版win中 ...
edge-tts微软文本转语音库，来听听这些语音是否很熟悉？
上期图文教程,我们分享了Azure机器学习的文本转语音的账号申请与API申请的详细步骤,也介绍了基于python3实现Azure机器学习文本转语音功能的代码实现过程,虽然我们可以使用Azure账号免费 ...
微软文本转语音小工具（Text to speech）网页版
之前在52破解上看到有人发布了一个文本转语音的小软件,使用微软提供的免费的文本转语音接口,自己闲着没事做了一个网页版的,用php调用微软接口生成语音.感兴趣的同学可以看下. 地址:www.text-t ...
edge-tts微软文本转语音库
Edge-TTS是一个Python库,比较好用,直接pip安装. pip install edge-tts 输入edge-tts,输出提示信息,安装完成. usage: edge-tts [-h] [ ...
windows下文本转语音TTS库封装
一.文本转语音实现本文提及的文本转语音库其实很多年前写的库,最近有才时间整理才将对应库整理成文章供各位网友参考. 其实在windows下自带了文本转语音以及语音识别的功能,这里由于项目中需要将报警文 ...

java实现微软文本转语音（TTS）经验总结