作为语音领域里的顶级国际会议,INTERSPEECH历来都是学术界和工业界关注的焦点,会议涵盖了语音语言处理和应用的各个方面,以及语音相关领域的各类前沿进展。INTERSPEECH2021于8月30日-9月3日举办,会议由国际语音通信协会 ISCA主办,今年会议为线上加线下(捷克布鲁诺)的形式。为方便全球各地研究者交流,今年被接收的论文都能进行视频展示。

希尔贝壳2篇论文入选

历届INTERSPEECH会收到来自全球上千家科研机构及企业厂商投稿,而最终入选的数量却十分有限。在今年Interspeech2021,希尔贝壳投递的2篇论文《AISHELL-3: A Multi-speaker Mandarin TTS Corpus 》 和《AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario》成功被大会收录其中。

论 文 1

题目:《AISHELL-3:A Multi-speaker Mandarin TTS Corpus 》

下载地址:https://arxiv.org/abs/2010.11567

作者:Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, Ming Li

合作单位:

  • School of Computer Science, Wuhan University, Wuhan, China

  • Data Science Research Center, Duke Kunshan University, Kunshan, China

  • Beijing Shell Shell Technology Co., Ltd, Beijing, China

简介:

In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers. Their auxiliary attributes such as gender, age group and native accents are explicitly marked and provided in the corpus. Accordingly, transcripts in Chinese character-level and pinyin-level are provided along with the recordings. We present a baseline system that uses AISHELL-3 for multi-speaker Madarin speech synthesis. The multi-speaker speech synthesis system is an extension on Tacotron-2 where a speaker verification model and a corresponding loss regarding voice similarity are incorporated as the feedback constraint. We aim to use the presented corpus to build a robust synthesis model that is able to achieve zero-shot voice cloning. The system trained on this dataset also generalizes well on speakers that are never seen in the training process. Objective evaluation results from our experiments show that the proposed multi-speaker synthesis system achieves high voice similarity concerning both speaker embedding similarity and equal error rate measurement. The dataset, baseline system code and generated samples are available online.

  INTERSPEECH展示信息:

论 文 2

题目:

《AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario》

下载地址:https://arxiv.org/abs/2104.03603

作者:

Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen

合作单位:

  • Northwestern Polytechnical University, Xi’an, China

  • Microsoft Corporation, USA

  • Microsoft Corporation, China

  • Beijing Shell Shell Technology Co., Ltd., Beijing, China

  • University of Science and Technology of China, Hefei, China

简介:

In this paper, we present AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bridge the advanced research on multi-speaker processing and the practical application scenario in three aspects. With real recorded meetings, AISHELL-4 provides realistic acoustics and rich natural speech characteristics in conversation such as short pause, speech overlap, quick speaker turn, noise, etc. Meanwhile, accurate transcription and speaker voice activity are provided for each meeting in AISHELL-4. This allows the researchers to explore different aspects in meeting processing, ranging from individual tasks such as speech front-end processing, speech recognition and speaker diarization, to multi-modality modeling and joint optimization of relevant tasks. Given most open source dataset for multi-speaker tasks are in English, AISHELL-4 is the only Mandarin dataset for conversation speech, providing additional value for data diversity in speech community. We also release a PyTorch-based training and evaluation framework as baseline system to promote reproducible research in this field.

  INTERSPEECH展示信息:

AISHELL 的开源项目已经成为了语音技术领域的数据开源标杆,目前已形成了智能语音技术+数据的矩阵开源方案,覆盖语音识别、声纹识别、语音合成、场景智能语音技术应用方案。

AISHELL会持续投入做开源,通过技术引领数据业务的发展,通过数据带动技术产业的成熟,在未来用前沿的数据库去服务开发者和科研人员,降低企业在算法落地层面的成本。还要用更多的开源数据与教育、研发、产品等相结合让技术落地走进更多的场景,为实现人工智能民主化希尔贝壳还需要更努力。

INTERSPEECH 2021丨希尔贝壳2篇论文入选全球顶级语音学术大会相关推荐

  1. 标贝科技语音论文入选全球顶级语音学术大会INTERSPEECH2019

    全球知名语音学术大会INTERSPEECH2019于9月15日至19日在奥地利格拉茨城市举行. 作为全球智能语音及AI数据发展的推动者,标贝科技受邀成为大会黄金级赞助厂商亮相现场.其中,由标贝语音团队 ...

  2. ICASSP 2022丨希尔贝壳1篇论文被录用

    ICASSP(英文全称International Conference on Acoustics, Speech and Signal Processing)即国际声学.语音与信号处理会议,是全世界最 ...

  3. NCMMSC 2021丨希尔贝壳参加第十六届全国人机语音通讯学术会议

    全国人机语音通讯会议是国内语音领域专家.学者和科研工作者交流最新研究成果,促进该领域研究和开发工作不断进步的重要舞台.该系列会议自1990年开创以来已成功召开了十五届.2021年第十六届全国人机语音通 ...

  4. 66篇论文入选CVPR 2021,商汤的秘籍竟是“大力出奇迹”

    点击上方"视学算法",选择加"星标"或"置顶" 重磅干货,第一时间送达 鱼羊 发自 凹非寺 量子位 报道 | 公众号 QbitAI CVer ...

  5. AAAI 2021 京东科技集团21篇论文

    点上方蓝字计算机视觉联盟获取更多干货 在右上方 ··· 设为星标 ★,与你不见不散 仅作学术分享,不代表本公众号立场,侵权联系删除 转载于:AI科技评论 AI博士笔记系列推荐 周志华<机器学习& ...

  6. 16篇论文入选AAAI 2021,京东数科AI都在关注什么?(附论文下载)

    近日,国际人工智能领域顶级学术会议AAAI2021(第35届AAAI)论文收录结果出炉.在国内AI阵营前列的京东数科以高达16篇论文的入选量成为本届AAAI的一大黑马.其研究方向包含了联邦学习.对抗学 ...

  7. 重磅!京东21篇论文入选AI顶会AAAI 2021

    点击上方"CVer",选择加"星标"置顶 重磅干货,第一时间送达 本文转载自:AI科技评论 近日,国际人工智能领域顶级学术会议AAAI 2021(第35届AAA ...

  8. 阿里妈妈技术团队 5 篇论文入选 TheWebConf 2022

    近日,第31届国际万维网大会(The Web Conference / WWW)审稿结果出炉, 阿里妈妈技术团队有5篇论文入选. TheWebConf 成立于1989年,原名为"The In ...

  9. 阿里妈妈技术团队5篇论文入选 SIGIR 2022!

    近日,第 45 届国际信息检索大会(The 45th International ACM SIGIR Conference on Research and Development in Informa ...

最新文章

  1. 【怎样写代码】工厂三兄弟之抽象工厂模式(三):产品等级结构与产品族
  2. 干货 | OpenCV中KLT光流跟踪原理详解与代码演示
  3. 爱情第七课,被爱的秘密
  4. (理论篇)从基础文件IO说起虚拟内存,内存文件映射,零拷贝
  5. Linux 系统启动流程及其介绍
  6. qstring 字符相同 不相等_我的编程手册 -- Java 基础篇·字符串 String
  7. linux查找特定类型的文件中是否包含特定字段
  8. Picocli 2.0:事半功倍
  9. LinkedList 方法知识点
  10. nginx虚拟主机配置和反向代理
  11. node mysql商城开发_GitHub - Ssipon/nideshop: NideShop:基于Node.js+MySQL开发的开源免费商城(api服务器端)...
  12. Linux下VsFTP和ProFTP用户管理高级技巧 之一
  13. s10_part3_django_ORM_查询相关_非常重要
  14. 高等代数第3版下 [丘维声 著] 2015年版_高等代数笔记整理(一)
  15. php 读取本地excel文件,PHP读取Excel文件的简单示例
  16. anjuta 连接mysql_深度商店应用Genymotion、Aptana Studio、宝塔Linux面板、Anjuta
  17. PowerShadow Master(影子系统)
  18. 浅谈NLP中的领域自适应(Domain Adaptation) 技术
  19. 在centos下安装pycrypto报错 RuntimeError: autoconf error
  20. 【行为管理篇】01. 恢复出厂及登录 ❀ 深信服上网行为管理

热门文章

  1. (POJ 1183)反正切函数的应用
  2. Python 实现股票数据的实时抓取
  3. jenkins:Build periodically和Poll SCM的区别
  4. MAC登陆steam玩dota2
  5. 集成ShareSDK
  6. 使用ShareSDK实现分享
  7. python爬虫及数据可视化分析
  8. visu studio编程中L、TEXT、_TEXT、_T 用法说明
  9. ORB特征提取和匹配
  10. 移动端日历控件 mobiscroll 的简单使用、参数设置