本论坛将于CNCC期间,10月24日13:30-15:30,在北京新世纪日航饭店2层四川厅举行。本论坛邀请到了国内外知名学者和工业界领军人物一起,讨论在人工智能领域设计领域定制芯片的挑战和机遇。欢迎您的参与!

随着摩尔定律发展的逐渐放缓,领域特定架构芯片成为当前处理器发展的主流方向。为了满足深度学习应用对计算力的巨大需求,硬件公司推出了各种领域特定架构的人工智能芯片,例如寒武纪Cambricon、华为昇腾系列、阿里巴巴含光系列等。开展面向人工智能芯片的自动编译技术对推动我国人工智能芯片的发展具有重要意义。

本论坛将讨论如下问题:

1) 如何设计面向人工智能芯片的领域特定编程语言?

2) 如何设计面向人工智能芯片的高效编译器?

3) 目前在人工智能芯片上编程语言和编译器主要痛点包括哪些?

4) 如何加强国产编程语言和编译器等核心基础系统软件的设计?

论坛主席

翟季冬

清华大学计算机系⻓聘副教授,博士生导师。ACM中国高性能计算专家委员会秘书⻓、北京智源⻘年科学家。主要研究领域为高性能计算、编译优化等。相关研究成果发表在高性能计算等领域重要国际会议和期刊SC、PPoPP、ICS、MICRO、ASPLOS、ATC、CGO、NSDI、IEEE TPDS、IEEE TC等。其中SC14论文入选会议Best Paper Finalist,是大陆学者首次入围该奖项。担任NPC 2018程序委员会主席、SC 2018/2019/2020、PPOPP 2019/2020/2021程序委员会委员、国际期刊IEEE TPDS编委、FCS和JCST⻘年编委等。担任清华大学学生超算团队教练,指导的团队共九次获得世界冠军。在2015年和2018年包揽了SC、ISC、ASC三大国际超算竞赛的总冠军,实现“大满贯”。获教育部科技进步一等奖、CCF优秀博士学位论文奖、国家自然科学基金优秀⻘年科学基金。

陈文光

清华大学计算机系教授,博士生导师。CCF杰出会员和杰出讲者,CCF副秘书⻓,CCF YOCSEF荣誉委员。主要研究领域为操作系统、程序设计语言与并行计算。多次担任高性能计算和并行计算重要国际会议如OSDI、PPoPP、CGO、SC、ICS、 PLDI、ASPLOS和APSYS的程序委员会委员。同时担任ACM中国理事会主席,ACM中国操作系统分会ChinaSys主席。获国家科技进步二等奖、国家教委科技进步二等奖和北京市科技进步二等奖各一次。国家杰出⻘年基金获得者。

讲者简介

胡振江

北京大学讲席教授,北京大学信息科学技术学院副院⻓、计算机科学技术系主任。1996年在日本东京大学信息工学专业获博士学位。曾担任东京大学情报理工学研究科教授,日本国立信息学研究所教授/系主任, 北京大学⻓江讲座教授。胡振江教授⻓期从事程序设计语言和软件科学与工程的研究,在程序语言设计、结构化函数式程序设计、程序的自动综合和优化、并行程序设计、双向变换语言的设计和实现、以及软件的演化和维护等方面做出了一系列开创性工作,曾获全日本最佳博士论文奖和日本软件科学会基础研究成就奖、日本工学会会士、欧洲科学院院士,IEEE Fellow、ACM杰出科学家。

演讲题目:从芯片定制到语言定制:程序设计语言的系统化定制及其支撑环境

摘要:随着摩尔定律的逐渐失效以及深度学习等高效特定计算的迫切需求,我们正渐渐转向一个⻘睐专用定制计算设备的时代。为此,我们需要软件具备面向不同专用硬件的定制能力。在这个报告中,我们将提出 程序设计语言的系统化定制的基本概念和应用,讨论其支撑环境的实现,并探讨未来的挑战。

卡内基梅隆大学、助理教授

陈天奇

Tianqi Chen is currently an Assistant Professor at the Machine Learning Department and Computer Science Department of Carnegie Mellon University. He received his PhD. from the Paul G. Allen School of Computer Science & Engineering at the University of Washington, working with Carlos Guestrin on the interp of machine learning and systems. He has created three major learning systems that are widely adopted: XGBoost, TVM, and MXNet (co-creator). He is a recipient of the Google Ph.D. Fellowship in Machine Learning.

演讲题目TVM:An automated deep learning compiler

摘要:Data, models, and computing are the three pillars that enable machine learning to solve real- world problems at scale. Making progress on these three domains requires not only disruptive algorithmic advances but also systems innovations that can continue to squeeze more efficiency out of modern hardware. Learning systems are in the center of every intelligent application nowadays. However, the ever-growing demand for applications and hardware specialization creates a huge engineering burden for these systems, most of which rely on heuristics or manual optimization. In this talk, I will present a new approach that uses machine learning to automate system optimizations. I will describe our approach in the context of deep learning deployment problems. I will first discuss how to design invariant representations that can lead to transferable statistical cost models, and apply these representations to optimize tensor programs used in deep learning applications. I will then describe the system improvements we made to enable diverse hardware backends. TVM, our end-to-end system, delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned deep learning frameworks.

卡内基梅隆大学、助理教授

贾志豪

Zhihao Jia is an incoming Assistant Professor of Computer Science at CMU (starting Fall 2021). He obtained his Ph.D. at Stanford working with Alex Aiken and Matei Zaharia. His research interests lie in the interp of computer systems and machine learning, with a focus on building efficient, scalable, and high-performance systems for ML computations.

演讲题目:Automated Discovery of Machine Learning Optimizations

摘要:As an increasingly important workload, machine learning (ML) applications require different performance optimization techniques from traditional runtimes and compilers. In particular, to accelerate ML applications, it is generally necessary to perform ML computations on heterogeneous hardware and parallelize computations using multiple data dimensions, neither of which is even expressible in traditional compilers and runtimes. In this talk, I will describe my work on automated discovery of performance optimizations to accelerate ML computations. TASO, the Tensor Algebra SuperOptimizer, optimizes the computation graphs of deep neural networks (DNNs) by automatically generating potential graph optimizations and formally verifying their correctness. TASO outperforms rule-based graph optimizers in existing ML systems (e.g., TensorFlow, TensorRT, and TVM) by up to 3X by automatically discovering novel graph optimizations, while also requiring significantly less human effort. FlexFlow is a system for accelerating distributed DNN training. FlexFlow identifies parallelization dimensions not considered in existing ML systems (e.g., TensorFlow and PyTorch) and automatically discovers fast parallelization strategies for a specific parallel machine. Companies and national labs are using FlexFlow to train production ML models that do not scale well in current ML systems, achieving over 10x performance improvement. I will also outline future research directions for further automating ML systems, such as codesigning ML models, software systems, and hardware backends for end-to-end ML deployment.

崔慧敏

博士,中国科学院计算技术研究所研究员,博士生导师。研究方向为异构芯片的编译和编程,近年来围绕大数据、AI等新型计算范式,研究这些应用在异构架构下的编译优化、编程环境优化。崔慧敏作为负责人承担自然科学基金、重点研发计划等多项国家级项目和课题,先后在PLDI、MICRO、PPoPP、TPDS等国际会议和期刊上发表论文二十余篇。

演讲题目:高性能智能处理器编程语言与编译器设计

摘要:以寒武纪平台为代表的高性能智能处理器提供了一个通用的深度学习平台,其目标是为当前和未来的智 能应用提供强大的计算能力。由于未来应用的多样性和不可预测性,提供基础的高级编程语言是其生态 构建和推广中不可缺少的一个环节。我们针对这一需求,以C语言为基础,面向应用和平台设计了通用 的高级编程语言Bang语言,解决了用户自定义算子的灵活开发问题。并进一步,利用深度的编译优化技 术来充分发挥芯片的处理能力。

阿里巴巴公司、资深总监

林伟

Wei Lin is currently Senior Director of Platform of Artificial Intelligence (PAI) and Chief Architect of Big-data computation platform in Alibaba. 15+ years’ experience specializing in backend/infrastructure, distributed system development, storage and a large-scale computation system include batch, streaming and machine learning.

演讲题目:AI Compiler at Alibaba

摘要:With the emerging AI workloads and diversity of executing computing hardware, AI compiler plays a vital role to bridge the gap between model expressive flexibility and underlying high- performance system implementation. In this talk, we will share our experiences of applying AI compiler into Alibaba's production environment, including: 1.Large-scale deployment of our AI Compiler into PAI (Platform of Artificial Intelligence) production clusters running stably for more than 6 months with tens of thousands of GPU hour saving. We will talk about our aggressive fusion and co-design strategy in which a cost-based approach is exploited to find the optimal fusion plan to boost hardware efficiency. In addition, lots of experiences to ensure that our compiler can be enabled by default in a large-scale production cluster will be shared. 2.Automatic code generation framework named as Ansor. This work has already been accepted by OSDI 2020 and deployed into our production environment. Compared with existing search strategies, Ansor explores much more optimization combinations and thus can find high-performance programs that are outside the search space of existing state-of-the-art approaches. Our evaluation shows that Ansor improves the execution performance of deep neural networks on the Intel CPU, ARM CPU, and NVIDIA GPU by up to 3:8x, 2:6x, and 1:7x, respectively. 3.Our thoughts about the future direction of AI compiler from industry perspective, such as the inter-play between compiler, runtime, resource scheduling and distributed execution. Also, we would like to raise some questions looking forward to the potential interaction between academia and industry.

鉴释公司、Chief Architect

Shin-Ming Liu 

Shin-Ming Liu is the Chief Architect@Xcalibyte.com. Shin-Ming started as compiler developer since early ’80. He has participated in compilation systems from scratch in various companies in Silicon Valley and established wide influence in modern day compilation systems including gcc and llvm design. Besides the in-depth compiler development work, Shin-Ming has been the Director for Java C/C++ ToolChain Lab. of HP-UX Server, Director for HP Kernel Development Lab for HP 3PAR Storage System, and developed extensive insight about computer ecosystem for high performance computing and software development productivity.

演讲题目:Matrix multiply: from 1 to 62806 X speedup Bridging the gap between productivity and performance

摘要:John Hennessy in his Stanford lecture discussed a new era of computing with the challenge to improve GEMM by ~63,000 times. We will further elaborate his vision with a deep dive into the compilation and runtime techniques needed. We also suggest a possible roadmap to bring productivity and performance into his vision . We argue the need for an open source platform that enables multiple languages co-exist in the compilation and runtime while allow individual chip/accelerator vendors to specialize for their target domain. We will analyze the technical challenges ahead and possible directions moving forward for a thriving industry in AI and data science.

点击“阅读原文”,报名参会。

量子位 QbitAI · 头条号签约作者

վ'ᴗ' ի 追踪AI技术和产品新动态

一键三连「分享」、「点赞」和「在看」

科技前沿进展日日相见~

CNCC技术论坛 | 面向人工智能芯片的编程语言和编译器相关推荐

  1. 聚焦AI落地痛点,纵论跨域学习技术前沿和应用趋势 | CNCC技术论坛

    <AI落地的跨域学习技术和进展>技术论坛将于CNCC期间,10月24日下午16:00-18:00,在北京新世纪日航饭店2层江苏厅举行.本论坛邀请跨域学习领域.学术界的顶尖学者和工业界的领军 ...

  2. CNCC技术论坛丨联邦学习冲刺人工智能“最后一公里”!

    来源:中国计算机学会 本论坛将于 CNCC2019 中国计算机大会第一天(10月17日)在苏州金鸡湖国际会议中心 A305 会议室举行,探讨号称人工智能"最后一公里"的联邦学习,届 ...

  3. CNCC技术论坛:后量子霸权阶段的量子计算

    本论坛将于CNCC期间,10月24日下午13:30-15:30,在北京新世纪日航饭店三层重庆厅举行.本论坛将邀请这一领域学术界的顶尖学者,探讨量子计算所面临的问题和发展趋势.欢迎参加! 量子计算主要利 ...

  4. 会议交流 - CNCC 技术论坛 | NLP中知识和数据怎么选?当然是全都要!——第四届中文信息技术发展战略研讨会...

    本文转载自公众号:中国计算机学会 . 本论坛将于CNCC2020期间,10月23日下午16:00-18:00,在北京新世纪日航饭店3层南京厅举行.本论坛将邀请多位来自国内著名高校和人工智能企业的知名自 ...

  5. CNCC 技术论坛 | 知识图谱赋能数字经济

    2018中国计算机大会(CNCC2018)将于10月25-27日在杭州国际博览中心(G20会场)举行,大会主题为「大数据推动数字经济」(Big Data Drives the Digital Econ ...

  6. 阿里巴巴新零售内容AI平台创新与实践 | CNCC技术论坛

    中国计算机学会 (CCF) 主办的「2020中国计算机大会」(简称:CNCC)将于10月22-24日在杭州未来科技城学术交流中心举办. 10月23日下午,阿里巴巴淘系技术部将在CNCC2020和大家分 ...

  7. 计算机学院类脑计算,类脑计算丨CNCC技术论坛

    报告一题目:学习与记忆研究的最新进展 摘要:本报告介绍大规模神经记录技术以及解码记忆在群体神经元网络中的编码规则.详细介绍自己团队近十年的大脑破译研究,在大脑工作原理的理论框架上取得了两大突破.一是于 ...

  8. 专访京东副总裁翁志:全方位解读 CNCC 2018「数据开创商业新生态」技术论坛 | CNCC 2018...

    雷锋网 AI 科技评论按:电子商务在中国高速发展的十余年中,为零售行业在成本.效率和体验上带来了颠覆式的发展.而近年来人工智能技术的发展,又带来了零售业态的新升级,即线上与线下零售逐渐走向融合,零售终 ...

  9. CNCC 2018 经典计算机算法技术论坛全解读 | CNCC 2018

    雷锋网 AI 科技评论按:计算机算法是计算机科学的灵魂和基石.在历史长河的检验中,一些计算机算法成为经典留存至今,随着计算机科学的日益发展而不断推陈出新,并对现今的计算机科学产生着重要的影响.然而,当 ...

最新文章

  1. ktor框架用到了netty吗_教你如何构建异步服务器和客户端的 Kotlin 框架 Ktor
  2. ThreadLocal为什么要使用弱引用和内存泄露问题
  3. pandas将满足某列的值挑出
  4. php 静态类在worker,GatewayWorker的Events.php中调用Worker::runAll()出现异常
  5. python接口测试上传文件_python https 接口测试 上传文件
  6. DHCP+TFTP+VSFTP+pxelinux+kickstart实现RedHat的自动安装
  7. 【NOIP2018】DAY2T2——填数游戏(轮廓线状压的dp?搜索打表)
  8. mysql 回表查询优化_MySQL优化:如何避免回表查询?什么是索引覆盖?
  9. 使用Try.NET创建可交互.NET文档
  10. Java基础--反射
  11. php 价格计算方法,PHP算法逻辑:如何计算购买量?
  12. 34.scrapy解决爬虫翻页问题
  13. 解决办法:Invalid Gradle JDK configuration found
  14. 源码时代html考试题,源码时代老师详解Web前端开发的三要素
  15. Web前端学习笔记(四)--- CSS卡贴悬停展开效果
  16. ns-3学习手记11_ofswitch13安装教程
  17. UE4代理委托(代理,动态代理,单播,多播)
  18. android6.0修改默认dns
  19. 【uniapp小程序】制作一个名片列表
  20. s18服务器维护,8月11日S15至S18服务器合服公告!

热门文章

  1. 数据快传对于企业的重要性!
  2. AngularJs 基础教程 —— 依赖注入
  3. 注释转换——(小项目)
  4. Windows 安装 psutil
  5. JBoss关闭时报Failed to authenticate principal=null,...
  6. 我的软考大事记(北京市)
  7. 当用户控件有异动时,网页某部位作出相应变化
  8. [HTML代码]会移动的文字(Marquee)
  9. python模拟布朗运动_python开发之thread实现布朗运动的方法
  10. linux定时关机命令_电脑设置定时关机你会吗?Windows自带的这行命令真好用