【API解析】微软文本转语音(text-to-speech)官方Demo调用步骤

1. 来源

  • github: MsEdgeTTS
  • 吾爱破解:微软语音助手免费版,支持多种功能,全网首发
  • 微软Demo: 文本转语音, speechSDK.js, text-to-speech,js

2. 准备工作

  • 功能来源:edge浏览器
  • 抓包工具:fiddler
  • 模拟请求:postman

3. 主要分析步骤

  • 第一步:点击文本转语音播放按钮,从开发者工具网络上直接找到了wss连接wss://eastus.tts.speech.microsoft.com/cognitiveservices/websocket/v1?Authorization=bearer%20{token},fiddler上也同样捕捉到了对应的wss请求
// 来源:https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech
// 初始化token和region
var localizedResources = {token: "eyJhbGciOiJodHRwOi8vd3d3LnczLm9yZy8yMDAxLzA0L3htbGRzaWctbW9yZSNobWFjLXNoYTI1NiIsInR5cCI6IkpXVCJ9.eyJyZWdpb24iOiJlYXN0dXMiLCJzdWJzY3JpcHRpb24taWQiOiI2MWIxODBlMmJkOGU0YWI2OGNiNmQxN2UxOWE5NjAwMiIsInByb2R1Y3QtaWQiOiJTcGVlY2hTZXJ2aWNlcy5TMCIsImNvZ25pdGl2ZS1zZXJ2aWNlcy1lbmRwb2ludCI6Imh0dHBzOi8vYXBpLmNvZ25pdGl2ZS5taWNyb3NvZnQuY29tL2ludGVybmFsL3YxLjAvIiwiYXp1cmUtcmVzb3VyY2UtaWQiOiIvc3Vic2NyaXB0aW9ucy9jMjU1ZGYzNi05NzRjLTQ2MGEtODMwYi0yNTE2NTEzYWNlYjIvcmVzb3VyY2VHcm91cHMvY3MtY29nbml0aXZlc2VydmljZXMtcHJvZC13dXMyL3Byb3ZpZGVycy9NaWNyb3NvZnQuQ29nbml0aXZlU2VydmljZXMvYWNjb3VudHMvYWNvbS1zcGVlY2gtcHJvZC1lYXN0dXMiLCJzY29wZSI6InNwZWVjaHNlcnZpY2VzIiwiYXVkIjoidXJuOm1zLnNwZWVjaHNlcnZpY2VzLmVhc3R1cyIsImV4cCI6MTY1NzU0MjgyMywiaXNzIjoidXJuOm1zLmNvZ25pdGl2ZXNlcnZpY2VzIn0.vI3ferw2AowktlDmmrMLvr-XVJicjm8gagPie59UZbc",region: "eastus",srComplete: "Done Recognizing Speech",srStartFailure: "Cannot Recognize Speech",srCanceledError: "Recognition was canceled due to error ",srStartSpeaking: "Start Speaking",srTryAgain: "An error occurred while loading this demo, please reload and try again",srTooManyFiles: "This demo supports a maximum of 5 files.",ttsPitch: "Pitch",ttsSpeed: "Speaking speed",ttsPreview: "Preview",ttsDefaultText: {...}
}// 来源:https://azurecomcdn.azureedge.net/cvt-f187b0e8321af2f3c7299619208c62b4c1e44f0eb595e2abd9bc3207f2c90b3e/scripts/Acom/Components/cognitiveServicesDemos/speechJsSdk/textToSpeech.js
// 获取语音包列表的代码
$.ajax({url: 'https://' + localizedResources.region + '.tts.speech.microsoft.com/cognitiveservices/voices/list',type: 'GET',beforeSend: function textToSpeechVoiceListBeforeAjaxSend(xhr) { xhr.setRequestHeader('Authorization', 'Bearer ' + localizedResources.token); },success: function textToSpeechVoiceListAjaxSuccess(data) {// put neural voices in front.var sorted = data.sort(function (a, b) {return a.VoiceType.localeCompare(b.VoiceType);});$.each(sorted, function (_index, element) {var displayName = element.DisplayName;if (element.Status === 'Deprecated') {// Don't show deprecated voices.return;}if (!voiceList[element.Locale]) {voiceList[element.Locale] = '';}if (element.VoiceType === 'Neural') {displayName += ' (Neural)';}if (element.LocalName !== element.DisplayName) {displayName += ' - ' + element.LocalName;}if (element.Status === 'Preview') {displayName += ' - ' + localizedResources.ttsPreview;}voiceList[element.Locale] += '<option value="' + element.ShortName + '">' + displayName + '</option>';styleList[element.ShortName] = element.StyleList;rolePlayList[element.ShortName] = element.RolePlayList;secondaryLocaleList[element.ShortName] = element.SecondaryLocaleList;});language.onchange();},error: function textToSpeechVoiceListAjaxError(_jqXHR, _textStatus, error) {status.innerText = localizedResources.srTryAgain;global.Core.Util.TrackException('A Text To Speech voice list API Ajax error occurred: ' + error);}
});// 来源:https://azurecomcdn.azureedge.net/cvt-f187b0e8321af2f3c7299619208c62b4c1e44f0eb595e2abd9bc3207f2c90b3e/scripts/Acom/Components/cognitiveServicesDemos/speechJsSdk/textToSpeech.js
// 播放按钮触发的函数
function SpeakOnce() {var config = SpeechSDK.SpeechTranslationConfig.fromAuthorizationToken(localizedResources.token, localizedResources.region),synthesizer,audioConfig;// due to a bug in Chromium (https://bugs.chromium.org/p/chromium/issues/detail?id=1028206)// mp3 playback has some beeps, using a higher bitrate here as a workaround.config.speechSynthesisOutputFormat = SpeechSDK.SpeechSynthesisOutputFormat.Audio24Khz160KBitRateMonoMp3;player = new SpeechSDK.SpeakerAudioDestination();player.onAudioEnd = function () {stopli.hidden = true;playli.hidden = false;};audioConfig = SpeechSDK.AudioConfig.fromSpeakerOutput(player);synthesizer = new SpeechSDK.SpeechSynthesizer(config, audioConfig);synthesizer.synthesisCompleted = function () {synthesizer.close();synthesizer = null;};synthesizer.SynthesisCanceled = function (s, e) {var details;stopli.hidden = true;playli.hidden = false;details = SpeechSDK.CancellationDetails.fromResult(e);if (details.reason === SpeechSDK.CancellationReason.Error) {status.innerText = localizedResources.srTryAgain;}};synthesizer.speakSsmlAsync(ssml.value, function () { }, function (error) {status.innerText = localizedResources.srTryAgain + ' ' + error;});
}
  • 第二步:分析和对照之前edge浏览器大声朗读功能api调用过程,步骤还是比较相似的,不过比之前要先从微软文本转语音官网上取得token,才能做后面的操作
/** postman中模拟成功* 从官网取得的token和region* http url: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech* method: GET*/
var region, token
{uri: `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`,headers: {Authorization: `bearer ${token}`},method: "GET"
}/** postman中模拟成功* 获取可用语音包,需要用到官网取得的token和region* http url: https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list* headers: { Authorization: `bearer ${token}` }* method: GET*/
{uri: `https://${region}.tts.speech.microsoft.com/cognitiveservices/voices/list`,headers: {Authorization: `bearer ${token}`},method: "GET"
}/** postman中模拟成功* 发送wss连接,传输文本和语音数据,模拟SpeakOnce()函数中speechSDK发送请求* wss url: wss://${region}.tts.speech.microsoft.com/cognitiveservices/websocket/v1?Authorization=bearer%20{token}* send: 需要先随机生成一个requestid(替换掉guid的分隔符“-”),共发送三次数据(第一次是speechSDK环境,第二次是音频格式,第三次是ssml标记文本)* receive: 接收到的音频字节包含在相同requestid的正文部分,用Path=audio\r\n定位正文索引* 注:和edge大声朗读接口不同,音频格式可以参照官方文档自己设置*/
{uri: `wss://${region}.tts.speech.microsoft.com/cognitiveservices/websocket/v1`,query: {Authorization: `bearer%20{token}`},sendmessage: {speechconfig: `
Path: speech.config
X-RequestId: 095E1E12004641208D62F656AC26CED6
X-Timestamp: 2022-07-11T10:45:52.938Z
Content-Type: application/json{"context":{"system":{"name":"SpeechSDK","version":"1.19.0","build":"JavaScript","lang":"JavaScript"},"os":{"platform":"Browser/Win32","name":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.49","version":"5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.49"}}}`,
speechconfig: `
Path: synthesis.context
X-RequestId: 095E1E12004641208D62F656AC26CED6
Content-Type: application/json{"synthesis":{"audio":{"metadataOptions":{"bookmarkEnabled":false,"sentenceBoundaryEnabled":false,"visemeEnabled":false,"wordBoundaryEnabled":false},"outputFormat":"audio-24khz-160kbitrate-mono-mp3"},"language":{"autoDetection":false}}}`,ssml: `
Path: ssml
X-RequestId: 095E1E12004641208D62F656AC26CED6
Content-Type: application/ssml+xml<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US"><voice name="en-US-JennyNeural"><prosody rate="0%" pitch="0%">You can replace this text with any text you wish. You can either write in this text box or paste your own text here.
Try different languages and voices. Change the speed and the pitch of the voice. You can even tweak the SSML (Speech Synthesis Markup Language) to control how the different sections of the text sound. Click on SSML above to give it a try!
Enjoy using Text to Speech!</prosody></voice></speak>`}
}

4. 编写代码

  • websocket库:WebSocketSharp。最新版安装失败的可以降版本安装,此文发布的时候最新预览版是1.0.3-rc11
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text;
using System.Security.Authentication;
using System.Web;
using System.Net;
using System.Text.RegularExpressions;
using WebSocketSharp;namespace ConsoleTest
{internal class Program{static string UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.49";static Dictionary<string, string> GetToken(){var url = "https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech";var request = WebRequest.CreateHttp(url);request.Method = "GET";request.UserAgent = UserAgent;var response = request.GetResponse();using (var stream = response.GetResponseStream())using (var rd = new StreamReader(stream)){var content = rd.ReadToEnd();var match = Regex.Match(content, @"localizedResources\s?=\s?{\r?\n\s+token:\s?""(?<token>.*?)"",\r?\n\s+region:\s?""(?<region>.*?)""");return new Dictionary<string, string>{{ "token", match.Groups["token"].Value },{ "region", match.Groups["region"].Value }};}}static void Main(string[] args){var localres = GetToken();var AudioDelimeter = "Path:audio\r\n";var url = $"wss://{localres["region"]}.tts.speech.microsoft.com/cognitiveservices/websocket/v1?Authorization={HttpUtility.UrlPathEncode("bearer " + localres["token"])}";var dataBuffers = new Dictionary<string, List<byte>>();// 音频格式var audioOutputFormat = "audio-24khz-160kbitrate-mono-mp3";// ssml参数var Language = "en-US";var Voice = "zh-CN-XiaoxiaoNeural";var Rate = 0;var Pitch = 0;var msg = "Hello world";// 生成requestIdvar sendRequestId = Guid.NewGuid().ToString().Replace("-", "").ToUpper();// 待发送信息var speechconfig = $"Path: speech.config\r\nX-RequestId: {sendRequestId}\r\nContent-Type: application/json\r\n\r\n"+ ("{'context':{'system':{'name':'SpeechSDK','version':'1.19.0','build':'JavaScript','lang':'JavaScript'},'os':{'platform':'Browser/Win32','name':'"+ UserAgent + "','version':'" + UserAgent.Split("/".ToCharArray(), 2)[1] +"'}}}").Replace("'", "\"");var speechcontext = $"Path: synthesis.context\r\nX-RequestId: {sendRequestId}\r\nContent-Type: application/json\r\n\r\n"+ ("{'synthesis':{'audio':{'metadataOptions':{'bookmarkEnabled':false,'sentenceBoundaryEnabled':false,'visemeEnabled':false,'wordBoundaryEnabled':false},'outputFormat':'" + audioOutputFormat + "'},'language':{'autoDetection':false}}}").Replace("'", "\"");var ssmltext = $"Path: ssml\r\nX-RequestId: {sendRequestId}\r\nContent-Type: application/ssml+xml\r\n\r\n"+ $"<speak xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xmlns:emo='http://www.w3.org/2009/10/emotionml' version='1.0' xml:lang='{Language}'><voice name='{Voice}'><prosody rate='{Rate}%' pitch='{Pitch}%'>{msg}</prosody></voice></speak>";Console.WriteLine(url);var webSocket = new WebSocket(url);webSocket.SslConfiguration.ServerCertificateValidationCallback = (sender, certificate, chain, sslPolicyErrors) => true;// 增加对Tls12的支持,否则连接时会报错:System.Security.Authentication.AuthenticationException: 调用 SSPI 失败,请参见内部异常。webSocket.SslConfiguration.EnabledSslProtocols = SslProtocols.Tls | SslProtocols.Tls11 | SslProtocols.Tls12 | SslProtocols.Ssl2;webSocket.OnOpen += (sender, e) => Console.WriteLine("[Log] WebSocket Open");webSocket.OnClose += (sender, e) => Console.WriteLine("[Log] WebSocket Close");webSocket.OnError += (sender, e) => Console.WriteLine("[Error] error message: " + e.Message);webSocket.OnMessage += (sender, e) =>{if (e.IsText){var data = e.Data;var requestId = Regex.Match(data, @"X-RequestId:(?<requestId>.*?)\r?\n").Groups["requestId"].Value;Console.WriteLine("- [" + requestId + "]:\n" + e.Data);if (data.Contains("Path:turn.start")){// start of turn, ignore. 开始信号,不用处理}else if (data.Contains("Path:turn.end")){// end of turn, close stream. 结束信号,可主动关闭socket// dataBuffers[requestId] = null;// 不要跟着MsEdgeTTS中用上面那句,音频发送完毕后,最后还会收到一个表示音频结束的文本信息webSocket.Close();}else if (data.Contains("Path:response")){// context response, ignore. 响应信号,无需处理}else{Console.WriteLine("unknow message: " + data); // 未知错误,通常不会发生}}else if (e.IsBinary){var data = e.RawData;var message = Encoding.UTF8.GetString(e.RawData);var requestId = Regex.Match(message, @"X-RequestId:(?<requestId>.*?)\r?\n").Groups["requestId"].Value;Console.WriteLine("- [" + requestId + "]:\nbyte array size: " + e.Data.Length);if (!dataBuffers.ContainsKey(requestId))dataBuffers[requestId] = new List<byte>();if (data[0] == 0x00 && data[1] == 0x67 && data[2] == 0x58){// Last (empty) audio fragment. 空音频片段,代表音频发送结束}else{var index = message.IndexOf(AudioDelimeter) + AudioDelimeter.Length;dataBuffers[requestId].AddRange(data.Skip(index));Console.WriteLine("buffer size: " + dataBuffers[requestId].Count);}}};webSocket.Connect();Console.WriteLine("--- speech.config ---\n" + speechconfig);webSocket.Send(speechconfig);Console.WriteLine("--- speech.context ---\n" + speechcontext);webSocket.Send(speechcontext);Console.WriteLine("--- ssml ---\n" + ssmltext);webSocket.Send(ssmltext);while (webSocket.IsAlive) { }Console.WriteLine("接收到的音频字节长度:" + dataBuffers[sendRequestId].Count);Console.ReadKey(true);}}
}

5. 结语

可以自定义输出格式,但需要在连接失败时重新在官网获取token

【API解析】微软文本转语音(text-to-speech)官方Demo调用步骤相关推荐

  1. 微软文本转语音实测记录附php/go调用源码

    接口地址 地址:http://www.mysqlschool.cn/SpeekText/index.php 提交方式:post/get 推荐post 例子:http://www.mysqlschool ...

  2. springboot微软文本转语音(texttospeach) java实现微软文本转语音

    java实现微软文本转语音(TTS)经验总结 官网地址: https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech-servic ...

  3. 微软文本转语音小工具(Text to speech)网页版

    之前在52破解上看到有人发布了一个文本转语音的小软件,使用微软提供的免费的文本转语音接口,自己闲着没事做了一个网页版的,用php调用微软接口生成语音.感兴趣的同学可以看下. 地址:www.text-t ...

  4. edge-tts微软文本转语音库,来听听这些语音是否很熟悉?

    上期图文教程,我们分享了Azure机器学习的文本转语音的账号申请与API申请的详细步骤,也介绍了基于python3实现Azure机器学习文本转语音功能的代码实现过程,虽然我们可以使用Azure账号免费 ...

  5. edge-tts微软文本转语音库

    Edge-TTS是一个Python库,比较好用,直接pip安装. pip install edge-tts 输入edge-tts,输出提示信息,安装完成. usage: edge-tts [-h] [ ...

  6. 微软文本转语音「免费网页版」

    网站地址:Text To Speech - 在线文本转语音 大家好-今天给小伙伴们安利一个AI配音小工具:TTS-文本转语音 [闲话] 疫情三年,很多人都失去工作,有的也是断断续续.很多人负债累累,在 ...

  7. 使用微软Azure的tts文本转语音服务出现java.lang.UnsatisfiedLinkError

    最近,在使用微软tts文本转语音的speech服务时,项目正常整合了微软的依赖,服务也正常启动.但是只要调用微软文本转语音服务api时,就会出现如下报错. 该方法是一个native方法,我以为是依赖中 ...

  8. python语音转文字库_py库:文本转为语音(pywin32、pyttsx)

    http://blog.csdn.net/marksinoberg/article/details/52137547 Python 文本转语音 文本转为语音(使用Speech API) 需要安装 py ...

  9. Stable Diffusion +ChatGPT+文本转语音+剪映制作视频

    目录 chatgpt免费体验入口网址 模型下载 huggingface.co civitai.com 使用Deliberate模型案例 StableDeffusion做的图片,chatGPT出的文案, ...

最新文章

  1. 结合jenkins以及PTP平台的性能回归测试
  2. angular路由操作中'#'字符的解决办法
  3. 添加ASP.NET网站资源文件夹
  4. linux 使用命令直接查看带单位的文件大小
  5. xxx is not mapped 错误 解决方案
  6. 使用PyPDF2库对pdf文件进行指定页面删除操作
  7. java.lang.NullPointerException org.apache.jsp.index_jsp._jspInit(index_jsp.java:22)
  8. win7旗舰版上装VS2010错误(提示:miicrosoft 应用程序报告[安装失败])
  9. java定时器定时发短信_quartz-job实现实时或定时发送短信任务(示例代码)
  10. 根据计算机配置设置bios,电脑如何进入bios设置
  11. 新手入门3D游戏建模一定要搞懂的流程!
  12. java mock私有方法_JMockit Mock 私有方法和私有属性
  13. 广东计算机非全日制 学校2020,双证非全日制太坑了?2020非全日制废了?
  14. 青可儿——名副其实的“三好饼干”
  15. c语言中f1是什么意思啊,F1知识科普,这些字母代表什么你知道吗?
  16. oppor15可以升级鸿蒙,关于OPPO R15你不能不知道的八大升级
  17. this.$refs 获取不到解决办法
  18. [论文阅读]BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
  19. SCNN-用于时序动作定位的多阶段3D卷积网络
  20. C语言冒泡排序的优化(图解+代码)

热门文章

  1. 命令行运行javac,报错: 编码 GBK 的不可映射字符 (0x9C)
  2. 页游中的PNG图片资源的裁剪和还原
  3. VCenter平台Linux虚拟机安装VMware Tools
  4. LS-Prepost 小球打靶
  5. 海康威视web无插件开发包webVideoCtrl.js+vue做网页开发
  6. 安装Pygame原来很简单
  7. 删除Microsoft Security Essentials
  8. 微信开发者工具Bug
  9. CRT,液晶,等离子显示器
  10. 【全面了解什么是等离子显示器】