INTERSPEECH 2021丨希尔贝壳2篇论文入选全球顶级语音学术大会

作为语音领域里的顶级国际会议，INTERSPEECH历来都是学术界和工业界关注的焦点，会议涵盖了语音语言处理和应用的各个方面，以及语音相关领域的各类前沿进展。INTERSPEECH2021于8月30日-9月3日举办，会议由国际语音通信协会 ISCA主办，今年会议为线上加线下（捷克布鲁诺）的形式。为方便全球各地研究者交流，今年被接收的论文都能进行视频展示。

希尔贝壳2篇论文入选

历届INTERSPEECH会收到来自全球上千家科研机构及企业厂商投稿，而最终入选的数量却十分有限。在今年Interspeech2021，希尔贝壳投递的2篇论文《AISHELL-3: A Multi-speaker Mandarin TTS Corpus 》和《AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario》成功被大会收录其中。

论文 1

题目：《AISHELL-3:A Multi-speaker Mandarin TTS Corpus 》

下载地址：https://arxiv.org/abs/2010.11567

作者：Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, Ming Li

合作单位：

School of Computer Science, Wuhan University, Wuhan, China
Data Science Research Center, Duke Kunshan University, Kunshan, China
Beijing Shell Shell Technology Co., Ltd, Beijing, China

简介：

In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers. Their auxiliary attributes such as gender, age group and native accents are explicitly marked and provided in the corpus. Accordingly, transcripts in Chinese character-level and pinyin-level are provided along with the recordings. We present a baseline system that uses AISHELL-3 for multi-speaker Madarin speech synthesis. The multi-speaker speech synthesis system is an extension on Tacotron-2 where a speaker verification model and a corresponding loss regarding voice similarity are incorporated as the feedback constraint. We aim to use the presented corpus to build a robust synthesis model that is able to achieve zero-shot voice cloning. The system trained on this dataset also generalizes well on speakers that are never seen in the training process. Objective evaluation results from our experiments show that the proposed multi-speaker synthesis system achieves high voice similarity concerning both speaker embedding similarity and equal error rate measurement. The dataset, baseline system code and generated samples are available online.

INTERSPEECH展示信息：

论文 2

题目：

《AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario》

下载地址：https://arxiv.org/abs/2104.03603

作者：

Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen

合作单位：

Northwestern Polytechnical University, Xi’an, China
Microsoft Corporation, USA
Microsoft Corporation, China
Beijing Shell Shell Technology Co., Ltd., Beijing, China
University of Science and Technology of China, Hefei, China

简介：

In this paper, we present AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bridge the advanced research on multi-speaker processing and the practical application scenario in three aspects. With real recorded meetings, AISHELL-4 provides realistic acoustics and rich natural speech characteristics in conversation such as short pause, speech overlap, quick speaker turn, noise, etc. Meanwhile, accurate transcription and speaker voice activity are provided for each meeting in AISHELL-4. This allows the researchers to explore different aspects in meeting processing, ranging from individual tasks such as speech front-end processing, speech recognition and speaker diarization, to multi-modality modeling and joint optimization of relevant tasks. Given most open source dataset for multi-speaker tasks are in English, AISHELL-4 is the only Mandarin dataset for conversation speech, providing additional value for data diversity in speech community. We also release a PyTorch-based training and evaluation framework as baseline system to promote reproducible research in this field.

INTERSPEECH展示信息：

AISHELL 的开源项目已经成为了语音技术领域的数据开源标杆，目前已形成了智能语音技术+数据的矩阵开源方案，覆盖语音识别、声纹识别、语音合成、场景智能语音技术应用方案。

AISHELL会持续投入做开源，通过技术引领数据业务的发展，通过数据带动技术产业的成熟，在未来用前沿的数据库去服务开发者和科研人员，降低企业在算法落地层面的成本。还要用更多的开源数据与教育、研发、产品等相结合让技术落地走进更多的场景，为实现人工智能民主化希尔贝壳还需要更努力。

INTERSPEECH 2021丨希尔贝壳2篇论文入选全球顶级语音学术大会相关推荐

标贝科技语音论文入选全球顶级语音学术大会INTERSPEECH2019
全球知名语音学术大会INTERSPEECH2019于9月15日至19日在奥地利格拉茨城市举行. 作为全球智能语音及AI数据发展的推动者,标贝科技受邀成为大会黄金级赞助厂商亮相现场.其中,由标贝语音团队 ...
ICASSP 2022丨希尔贝壳1篇论文被录用
ICASSP(英文全称International Conference on Acoustics, Speech and Signal Processing)即国际声学.语音与信号处理会议,是全世界最 ...
NCMMSC 2021丨希尔贝壳参加第十六届全国人机语音通讯学术会议
全国人机语音通讯会议是国内语音领域专家.学者和科研工作者交流最新研究成果,促进该领域研究和开发工作不断进步的重要舞台.该系列会议自1990年开创以来已成功召开了十五届.2021年第十六届全国人机语音通 ...
66篇论文入选CVPR 2021，商汤的秘籍竟是“大力出奇迹”
点击上方"视学算法",选择加"星标"或"置顶" 重磅干货,第一时间送达鱼羊发自凹非寺量子位报道 | 公众号 QbitAI CVer ...
AAAI 2021 京东科技集团21篇论文
点上方蓝字计算机视觉联盟获取更多干货在右上方 ··· 设为星标 ★,与你不见不散仅作学术分享,不代表本公众号立场,侵权联系删除转载于:AI科技评论 AI博士笔记系列推荐周志华<机器学习& ...
16篇论文入选AAAI 2021，京东数科AI都在关注什么？（附论文下载）
近日,国际人工智能领域顶级学术会议AAAI2021(第35届AAAI)论文收录结果出炉.在国内AI阵营前列的京东数科以高达16篇论文的入选量成为本届AAAI的一大黑马.其研究方向包含了联邦学习.对抗学 ...
重磅！京东21篇论文入选AI顶会AAAI 2021
点击上方"CVer",选择加"星标"置顶重磅干货,第一时间送达本文转载自:AI科技评论近日,国际人工智能领域顶级学术会议AAAI 2021(第35届AAA ...
阿里妈妈技术团队 5 篇论文入选 TheWebConf 2022
近日,第31届国际万维网大会(The Web Conference / WWW)审稿结果出炉, 阿里妈妈技术团队有5篇论文入选. TheWebConf 成立于1989年,原名为"The In ...
阿里妈妈技术团队5篇论文入选 SIGIR 2022！
近日,第 45 届国际信息检索大会(The 45th International ACM SIGIR Conference on Research and Development in Informa ...

INTERSPEECH 2021丨希尔贝壳2篇论文入选全球顶级语音学术大会

INTERSPEECH 2021丨希尔贝壳2篇论文入选全球顶级语音学术大会相关推荐

最新文章

热门文章