自我介绍

大家好，我是VAIN，这是我在CSDN的第一篇文章，之前一直在微博博客上写文章，今后会用CSDN给大家更新一些技术帖，还希望大家多多关照！

项目介绍

因为公司项目要求，今天给大家分享一个unity制作AI助手的帖子，由于网上相关的文章还是比较少的，要么就是不是特别的全面。所以分享一下，希望可以帮助到小伙伴们。

思路

1.需要将我们说的话转成文字(语音识别)
2.AI助手理解我们说的话（人机交互）
3.将AI助手的返回信息转成语音（语音合成）

制作准备

我做的是PC端的，用的是讯飞的SDK和百度的SDK。至于为什么用两个SDK，我也不想啊，C++我也不会啊。这里吐槽一下，讯飞的技术文档真的写给自己看的。
讯飞：语音识别、语音合成（Windows MSC）
百度：人机交互（UNIT）

开始

1.首先我们去讯飞下载SDK，至于怎么下，一些平台操作，这里不做过多的讲解，网上有很多。
2.导入到Unity项目中 msc是语音识别和语音合成会用到的，aiui（人机交互）可以不用导入，用不到。我这边调用aiui的接口不知道为什么一直差找不到接口。所以才用了百度的UNIT。

这里需要注意的是自己下载SDK只能用对应自己的appid，如果你用了别的SDK，那你就只能用别人的appid。
3.讯飞的SDK是C++写的，我们想要使用，还需要使用非托管DLL的方式

这里我就直接贴出来了

public class MSCDLL{#region 登录登出[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern int MSPLogin(string usr, string pwd, string parameters);[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern int MSPLogout();#endregion#region 语音识别[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern IntPtr QISRSessionBegin(string grammarList, string _params, ref int errorCode);[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern int QISRAudioWrite(IntPtr sessionID, byte[] waveData, uint waveLen, AudioStatus audioStatus, ref EpStatus epStatus, ref RecogStatus recogStatus);[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern IntPtr QISRGetResult(IntPtr sessionID, ref RecogStatus rsltStatus, int waitTime, ref int errorCode);[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern int QISRSessionEnd(IntPtr sessionID, string hints);[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern int QISRBuildGrammar(IntPtr grammarType, string grammarContent, uint grammarLength, string _params, GrammarCallBack callback, IntPtr userData);[System.Runtime.InteropServices.UnmanagedFunctionPointerAttribute(System.Runtime.InteropServices.CallingConvention.Cdecl)]public delegate int GrammarCallBack(int errorCode, string info, object udata);[DllImport("msc.dll", CallingConvention = CallingConvention.StdCall)]public static extern IntPtr QISRUploadData(string sessionID, string dataName, byte[] userData, uint lenght, string paramValue, ref int errorCode);#endregion#region 语音唤醒//定义回调函数[UnmanagedFunctionPointer(CallingConvention.Cdecl)]public delegate int ivw_ntf_handler(string sessionID, int msg, int param1, int param2, IntPtr info, IntPtr userData);//调用 QIVWSessionBegin(...)开始一次语音唤醒[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern IntPtr QIVWSessionBegin(string grammarList, string _params, ref int errorCode);//调用 QIVWAudioWrite(...) 分块写入音频数据[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern int QIVWAudioWrite(string sessionID, byte[] waveData, uint waveLen, AudioStatus audioStatus);[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern int QIVWGetResInfo(string resPath, string resInfo, uint infoLen, string _params);//调用 QIVWRegisterNotify(...) 注册回调函数到msc。//如果唤醒成功，msc 调用回调函数通知唤醒成功息同时给出相应唤醒数据。如果出错，msc 调用回调函数给出错误信息[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern int QIVWRegisterNotify(string sessionID, [MarshalAs(UnmanagedType.FunctionPtr)]ivw_ntf_handler msgProcCb, IntPtr userData);//调用 QIVWSessionEnd(...) 主动结束本次唤醒[DllImport("msc_x64", CallingConvention = CallingConvention.StdCall)]public static extern int QIVWSessionEnd(string sessionID, string hints);#endregion#region 语音合成[DllImport("msc_x64", CallingConvention = CallingConvention.Winapi)]public static extern IntPtr QTTSSessionBegin(string _params, ref int errorCode);[DllImport("msc_x64", CallingConvention = CallingConvention.Winapi)]public static extern int QTTSTextPut(IntPtr sessionID, string textString, uint textLen, string _params);[DllImport("msc_x64", CallingConvention = CallingConvention.Winapi)]public static extern IntPtr QTTSAudioGet(IntPtr sessionID, ref uint audioLen, ref SynthStatus synthStatus, ref int errorCode);[DllImport("msc_x64", CallingConvention = CallingConvention.Winapi)]public static extern IntPtr QTTSAudioInfo(IntPtr sessionID);[DllImport("msc_x64", CallingConvention = CallingConvention.Winapi)]public static extern int QTTSSessionEnd(IntPtr sessionID, string hints);#endregion}

4.下面我们开始做语音识别部分。官网提供的C#案例是用一个音频文件来进行识别，这不是我们需要的效果，我们进行一些修改。跳过读取音频文件这个步骤，直接从audio.clip中直接拿音频数据。

 // An highlighted block/// <summary>/// 音频识别功能/// </summary>/// <param name="audio_content">byte[]</param>/// <param name="session_begin_params">设置识别的参数：语言、领域、语言区域。。。。</param>///参数可以参考这个 "sub = iat, domain = iat, language = zh_cn, accent = mandarin, sample_rate = 16000, result_type = plain, result_encoding = utf-8";private void AudioDiscern(byte[] audio_content, string session_begin_params){StringBuilder result = new StringBuilder();//存储最终识别的结果var aud_stat = AudioStatus.MSP_AUDIO_SAMPLE_CONTINUE;//音频状态var ep_stat = EpStatus.MSP_EP_LOOKING_FOR_SPEECH;//端点状态var rec_stat = RecogStatus.MSP_REC_STATUS_SUCCESS;//识别状态int errcode = (int)Errors.MSP_SUCCESS;int totalLength = 0;//用来记录总的识别后的结果的长度，判断是否超过缓存最大值    Debug.Log("正在进行语音识别...");QISRSessionBegin(null, session_begin_params, ref errcode, ref session_id);QISRAudioWrite(session_id, audio_content, (uint)audio_content.Length, aud_stat, ref ep_stat, ref rec_stat);QISRAudioWrite(session_id, null, 0, AudioStatus.MSP_AUDIO_SAMPLE_LAST, ref ep_stat, ref rec_stat);while (rec_stat != RecogStatus.MSP_REC_STATUS_COMPLETE) //如果没有完成就一直继续获取结果{QISRGetResult(totalLength, result, session_id, ref rec_stat, 0, ref errcode);Thread.Sleep(150);//防止频繁占用cpu}Debug.Log("语音听写结束！\n结果：" + result.ToString());Player_Audio_Value = result.ToString();QISRSessionEnd(session_id, "");}

audio_content就是我们传入的音频数据。下面的问题是怎么把audio.clip直接转成byte[]。可以参考:https://blog.csdn.net/qq_28745613/article/details/84874752 不够这个还需要优化。

5.下面就是与百度UNIT做人机交互了。我们需要将语音识别出来的值给到UNIT，官网案例讲的还是比较细的。
大概说下步骤：
1.设置请求参数参数是json 我这里用的是机器人的需要注意下技能的和机器人的请求参数是不一样的。好好看下文档，我也是没看文档，傻傻的去问客服为什么会报错。


{"log_id":"自己定义","version":"2.0","service_id":"自己的机器人id","session_id":"","request":{"query":"今天的天气怎么样","user_id":"自己定义"},"dialog_state":{"contexts":{"SYS_REMEMBERED_SKILLS":[""]}}
}

post方式请求https://aip.baidubce.com/rpc/2.0/unit/service/chat?access_token= 你自己的token 发送请求参数，返回值也是json
不会使用就看看这个帖子https://ai.baidu.com/forum/topic/show/944007还是很有帮助的（感叹一下，讯飞文档能有百度这样就好了）

 /// <summary>/// 发送请求获取返回值/// </summary>/// <param name="query">我的提问</param>/// <returns></returns>public static string UNIT_Utterance(string query){string token = GetAccessToken();string host = "https://aip.baidubce.com/rpc/2.0/unit/service/chat?access_token=" + token;HttpWebRequest request = (HttpWebRequest)WebRequest.Create(host);//post方法request.Method = "post";//类型request.ContentType = "application/json";request.KeepAlive = true;//请求参数string str = "{\"log_id\":\"\",\"version\":\"2.0\",\"service_id\":\"\",\"session_id\":\"" + session_id + "\",\"request\":{\"query\":\"" + query + "\",\"user_id\":\"\"},\"dialog_state\":{\"contexts\":{\"SYS_REMEMBERED_SKILLS\":[\"" + origin + "\"]}}}";//参数自己填//转byte[]byte[] buffer = Encoding.UTF8.GetBytes(str);//byte[]长度request.ContentLength = buffer.Length;request.GetRequestStream().Write(buffer, 0, buffer.Length);HttpWebResponse response = (HttpWebResponse)request.GetResponse();//发送请求参数StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);//获取返回值jsonstring result = reader.ReadToEnd();//解析jsonStringToJson.Utterance value = JsonUtility.FromJson<StringToJson.Utterance>(result);//参数赋值 //这个是用来做多轮对话用的session_id = value.result.session_id;//选择优先级高的originorigin = value.result.response_list[0].origin;//for (int i = 0; i < value.result.response_list.Count; i++)//{//    for (int j = 0; j < value.result.response_list[i].action_list.Count; j++)//    {//        //打印全部的返回 say 和 confidence//        Debug.Log(value.result.response_list[i].action_list[j].say + value.result.response_list[i].action_list[j].confidence);//    }//}//返回值选择优先级高的sayreturn value.result.response_list[0].action_list[0].say;}

6.下面就是最后一步，把返回的值做语音合成。语音合成目前是保存成音频文件，在加载到audio.clip上

 private byte[] bytes;
private void CreateAudio(string speekText, string szParams){QTTSSessionBegin(ref session_id, szParams,ref err_code);QTTSTextPut(session_id, speekText, (uint)Encoding.Default.GetByteCount(speekText), string.Empty);uint audio_len = 0;SynthStatus synth_status = SynthStatus.MSP_TTS_FLAG_STILL_HAVE_DATA;MemoryStream memoryStream = new MemoryStream();memoryStream.Write(new byte[44], 0, 44);while (true){IntPtr source = MSCDLL.QTTSAudioGet(session_id, ref audio_len, ref synth_status, ref err_code);byte[] array = new byte[audio_len];if (audio_len > 0){Marshal.Copy(source, array, 0, (int)audio_len);}memoryStream.Write(array, 0, array.Length);Thread.Sleep(150);if (synth_status == SynthStatus.MSP_TTS_FLAG_DATA_END || err_code != (int)Errors.MSP_SUCCESS)break;}QTTSSessionEnd(session_id, "");WAVE_Header header = getWave_Header((int)memoryStream.Length - 44);//创建wav文件头byte[] headerByte = StructToBytes(header);//把文件头结构转化为字节数组memoryStream.Position = 0;//定位到文件头memoryStream.Write(headerByte, 0, headerByte.Length);//写入文件头bytes = memoryStream.ToArray();memoryStream.Close();if (AI_audio_url != null){if (File.Exists(AI_audio_url)){File.Delete(AI_audio_url);}File.WriteAllBytes(AI_audio_url, bytes);StartCoroutine(OnAudioLoadAndPaly(AI_audio_url, audio_type,AI.GetComponent<AudioSource>()));}}/// <summary>/// 结构体转字符串/// </summary>/// <param name="structure"></param>/// <returns></returns>private byte[] StructToBytes(object structure){int num = Marshal.SizeOf(structure);IntPtr intPtr = Marshal.AllocHGlobal(num);byte[] result;try{Marshal.StructureToPtr(structure, intPtr, false);byte[] array = new byte[num];Marshal.Copy(intPtr, array, 0, num);result = array;}finally{Marshal.FreeHGlobal(intPtr);}return result;}/// <summary>/// 结构体初始化赋值/// </summary>/// <param name="data_len"></param>/// <returns></returns>private WAVE_Header getWave_Header(int data_len){return new WAVE_Header{RIFF_ID = 1179011410,File_Size = data_len + 36,RIFF_Type = 1163280727,FMT_ID = 544501094,FMT_Size = 16,FMT_Tag = 1,FMT_Channel = 1,FMT_SamplesPerSec = 16000,AvgBytesPerSec = 32000,BlockAlign = 2,BitsPerSample = 16,DATA_ID = 1635017060,DATA_Size = data_len};}/// <summary>/// 语音音频头/// </summary>private struct WAVE_Header{public int RIFF_ID;public int File_Size;public int RIFF_Type;public int FMT_ID;public int FMT_Size;public short FMT_Tag;public ushort FMT_Channel;public int FMT_SamplesPerSec;public int AvgBytesPerSec;public ushort BlockAlign;public ushort BitsPerSample;public int DATA_ID;public int DATA_Size;}

这样基本的AI助手功能就算完成了
项目地址：https://download.csdn.net/download/weixin_42208093/12446368
demo地址：https://pan.baidu.com/s/1nmd2j_FCSsB8sSIN-u3eNA 提取码：5r1m
注意需要更换自己的SDK和appid 等等参数

Unity AI 语音识别、语音合成、人机交互（一）相关推荐

delphi文本转换mp3_基于百度AI 的语音合成、转换的 Yii2 sdk.
saviorlv/yii2-bd-aipspeech 基于百度AI 的语音合成.转换的 Yii2 sdk. 安装 $ composer require saviorlv/yii2-bd-aipspee ...
最佳实践 | 用腾讯云AI语音识别零基础实现小程序语音输入法
先回顾下,生活.工作中你使用过哪些语音识别相关的产品或者服务? 培训/考试相关的小程序,使用语音识别来判断回答是否正确: 英语口语练习的小程序,使用语音识别来打分: 你画我猜类的小程序,使用语音识别来 ...
python 语音识别_带有Python的AI –语音识别
python 语音识别带有Python的AI –语音识别 (AI with Python – Speech Recognition) In this chapter, we will learn a ...
免费自动AI语音识别制作字幕，从此解放时间自由-自媒体必备干货
大家好我是啊忠略懂点,欢迎来到我的宝藏频道,我主要分享各种省钱避坑,跨境技术和自媒体转化工具的知识内容,如果感兴趣的欢迎订阅我的频道,看看不一样的赛道. AI语音识别是很大的应用场景,而针对我们个人的 ...
Unity的语音识别
unity实现语音识别:PC端.移动端一.PC端(方式一) Unity有个自带的库,使用它可以很方便的实现PC端的语音识别:可以用它来识别一些关键字,进行短语句的识别. "UnityEng ...
Unity AI副总裁Danny Lange：如何用AI助推游戏行业？
本文讲的是Unity AI副总裁Danny Lange:如何用AI助推游戏行业? , 10月26日,在加州山景城举办的ACMMM 2017大会进入正会第三天.在会上,Unity Technology负 ...
Qt实践|HTTP知识点-使用百度AI语音识别服务
目录基本概念代码与实例基本概念这里来简单介绍下百度的REST API文档,首先是获取Access Token 获取到了后,再发送请求,进行语音识别. 这里我采用Raw方式上传这个就是百度AI ...
微信小程序-百度AI语音识别——（一）
微信小程序-百度AI语音识别--(一) 一.百度AI 某天闲来无事在CSDN上看到有大神用百度语音识别+图灵机器人做了一个可以语音聊天(调戏人工智障 )的小demo,觉得挺有趣的,也想着实现一下. 百 ...
jacob离线语音合成和百度AI在线语音合成
因为工作需要,所以找了一下文本转语音这方面的资料,jacob和百度AI都有优缺点,jacob不需要联网,但是转换后的语音样式没有,百度AI的语音的样式多,但是需要联网. 如果是条件可以的话,用百度AI ...

Unity AI 语音识别、语音合成、人机交互（一）

自我介绍

项目介绍

思路

制作准备

开始

Unity AI 语音识别、语音合成、人机交互（一）相关推荐

最新文章

热门文章