Data + AI Summit 2022 超清视频下载

Data + AI Summit 2022 于2022年06月27日至30日举行。本次会议是在旧金山进行，中国的小伙伴是可以在线收听的，一共为期四天，第一天是培训，后面几天才是正式会议。本次会议有超过200个议题，演讲嘉宾包括业界、研究和学术界的专家，本次会议主要分为六大块：

•数据分析, BI 以及可视化：了解最新的数据分析、BI 和可视化技术以及客户和社区的解决方案。•数据工程：从实现数据管道到管理数据质量、ETL和数据质量框架再到数据 ops，深入了解最新的数据工程知识。•Data Lakes, Data Warehouses and Data Lakehouses：了解数据湖和数据仓库演变为 Data Lakehouses 背后的概念和最佳实践；•数据科学, 机器学习以及 MLOps：了解关于生产数据科学和机器学习管道的技术和最佳实践。•数据安全和治理：•学术研究：致力于学术和先进的工业研究领域，包括大规模调度程序，图表，数据分析和机器学习系统。

会议的全部日程请参见：https://databricks.com/dataaisummit/agenda

如果想及时了解Spark、Hadoop或者HBase相关的文章，欢迎关注微信公众号：过往记忆大数据

本次会议的第一天 KeyNote 宣布了几件重要的事情：Apache Spark 后续发展、下一代 Structured Streaming 解决方案、Delta Lake 的功能全部开源。除了第一天的 KeyNote，下面几个议题也推荐大家看看：

•Apache Spark SQL Aggregate Improvement at Meta (Facebook)

•Recent Parquet Improvements in Apache Spark

•Spark Data Source V2 Performance Improvement: Aggregate Push Down

•Deep Dive into the New Features of Apache Spark 3.2 and 3.3

•Managing Straggler Executors at Apache Spark 3.3

•Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors

•PySpark in Apache Spark 3.3 and Beyond

•Delta Lake 2.0 Overview

•Improving Interactive Querying Experience on Spark SQL

•Moving from Apache Spark 2 to Apache Spark 3: Spark Version Upgrade at Scale in Pinterest

•Radical Speed on the Lakehouses: Photon under the hood

超清视频下载途径

考虑到大家可能对不同的主题感兴趣，这里给大家整理了所有可以下载的视频，全部是超清，大家可以根据自己的兴趣去下载观看。另外，会议的 PPT 当前还不可以下载，需要 PPT 的同学可以继续关注本公众号，获取相关消息。

关注微信公众号 过往记忆大数据 或者 Java与大数据架构 并回复 10187 获取 Data + AI Summit 2022 超清视频。

可下载视频的议题

本次可下载视频的议题共 197 个。

•A Low-Code Approach to 10x Data Engineering•A Modern Approach to Big Data for Finance•A Practitioner's Guide to Unity Catalog—A Technical Deep Dive•Accelerating Hybrid Data Mesh Implementation•Accidentally Building a Petabyte-Scale Cybersecurity Data Mesh in Azure With Delta Lake at HSBC•Administrator Best Practices and Tips for Future-proofing your Databricks Account•Advanced Migrations From Hive to SparkSQL•Adversarial Drifts Model Monitoring and Feedback Loops Building Human-in-the-Loop Machine Learning Systems for Content Moderation•An Advanced S3 Connector for Spark to Hunt for Cyber Attacks•Analyzing Population Health using Healthcare Claims•Apache Arrow Flight SQL: High Performance, Simplicity, and Interoperability for Data Transfers•Apache Spark SQL Aggregate Improvement at Meta Facebook•Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors•Automate Your Delta Lake or Practical Insights on Building Distributed Data Mesh•Automating Model Lifecycle Orchestration with Jenkins•Backfill Streaming Data Pipelines in Kappa Architecture•Batches Streams and Everything in between Unifying Batch and Stream Storage with Apache Pulsar and Lakehouse Architectures•Beyond Daily Batch Processing Operational Trade-Offs of Microbatch Incremental and Real-Time Processing for Your ETLs (and Your Team's Sanity)•Beyond Monitoring The Rise of Data Observability•Build an Enterprise Lakehouse for Free with Trino and Delta Lake•Building Enterprise Scale Data and Analytics Platforms at Amgen•Building Metadata and Lineage Driven Pipelines on Kubernetes•Building Production-Ready Recommender Systems with Feature Stores•Building Scalable & Advanced AI based Language Solutions for R&D using Databricks•Building Spatial Applications with Apache Spark and CARTO•Building a Lakehouse for Data Science at DoorDash•Building a Lakehouse on AWS for Less with AWS Graviton and Photon•Building an Operational Machine Learning Organization from Zero and Leveraging ML for Crypto Security•Building and Scaling Machine Learning-Based Products in the World's Largest Brewery•Chaos Engineering in the World of Large-Scale Complex Data Flow•Cloud Native Geospatial Analytics at JLL•Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databricks•Complete Data Security and Governance Powered by Unity Catalog and Immuta•Connecting the Dots with DataHub Lakehouse and Beyond•Constraints, Democratization, and the Modern Data Stack - Building a Data Platform At Red Ventures with Fivetran and Databricks•Coral and Transport Portable SQL and UDFs for the Interoperability of Spark and Other Engines•Correlation Over Causation Cracking the Relationship Between User Engagement and User Happiness•Cutting the Edge in Fighting Cybercrime Reverse-Engineering a Search Language to Cross-Compile it to PySpark•DELETE UPDATE MERGE Operations in Data Source V2•Data Lakehouse and Data Mesh—Two Sides of the Same Coin•Data Mesh Implementation Patterns•Data Warehousing on the Lakehouse•DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine•Databricks Lakehouse Overview•Databricks SQL Under the Hood: What's New with Live Demos•Day 1 Afternoon Keynote•Day 2 Afternoon Keynote•Day 2 Opening Keynote•Deep Dive How to Build Your Modern Data Stack on Databricks to Solve Modern Problems•Deep Dive into the New Features of Apache Spark 3.2 and 3.3•Deliver Faster Decision Intelligence From Your Lakehouse•Delta Lake, the Foundation of Your Lakehouse•Delta Live Tables Modern software engineering and management for ETL•Delta Sharing - A New Paradigm for Secure Data Sharing and Data Collaboration on Lakehouse•Delta Sharing for Healthcare and Life Sciences•Democratizing Metrics at Airbnb•Designing Better MLOps Systems•Destination Lakehouse All Your Data Analytics and AI on One Platform•Distributed Machine Learning at Lyft•Dive Deeper into Data Engineering on Databricks•Doubling the Capacity of the Data Platform Without Doubling the Cost•Driving Real-Time Data Capture and Transformation in Delta Lake with Change Data Capture•Efficient and Multi-Tenant Scheduling of Big Data and AI Workloads•Eliminating AI Risk—One Model Failure at a Time•Emerging Data Architectures & Approaches for Real-Time AI using Redis•Enable Production ML with Databricks Feature Store•Enabling Advanced Analytics at The Department of State using Databricks•Enabling BI in a Lakehouse Environment How Spark and Delta Can Help With Automating a DWH Development•Enabling Business Users to Perform Interactive Ad-Hoc Analysis over Delta Lake with No Code•Ensuring Correct Distributed Writes to Delta Lake in Rust with Formal Verification•Entity Resolution•Evolution of Data Architectures and How to Build a Lakehouse•Financial Services Industry Forum: The Future of Financial Services is Open with Data and AI at Its Core•Fugue Tune Distributed Hybrid Hyperparameter Tuning•FugueSQL—The Enhanced SQL Interface for Pandas and Spark DataFrames•FutureMetrics Using Deep Learning to Create a Multivariate Time Series Forecasting Platform for Economic Strategic Planning•Gamer User Toxicity•Gazelle-Jni: A Middle Layer to Offload Spark SQL to Native Engines for Execution Acceleration•Government Industry Forum Lunch and Program•Hassle-Free Data Ingestion into the Lakehouse•Healthcare Data Interoperability•How AARP Services Inc. automated SAS transformation to Databricks using LeapLogic—A cloud accelerator for transformation of legacy analytics ETL DW & Hadoop•How AT&T Data Science Team Solved an Insurmountable Big Data Challenge on Databricks with Two Different Approaches using Photon and RAPIDS Accelerator for Apache Spark•How Databricks is driving disruptive digital transformation in the airline industry•How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities•How McAfee Leverages Databricks on AWS at Scale•How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins•How To Make Apache Spark on Kubernetes Run Reliably on Spot Instances•How To Use Databricks SQL for Analytics on Your Lakehouse•How to Implement a Semantic Layer for Your Lakehouse•How unsupervised machine learning can scale data quality monitoring in Databricks•Immuta - Unlocking sensitive use cases with automated data access•Implementing a Framework for Data Security and Policy at a Large Public Sector Agency•Implementing an End-to-End Demand Forecasting Solution Through Databricks and MLflow•Improving Apache Spark Structured Streaming Application Processing Time by Configurations Code Optimizations and Custom Data Source•Improving Interactive Querying Experience on Spark SQL•Introducing Zipline An Open Source Feature Engineering Platform•Introduction to Flux and OSS Replication•Lakehouse with Delta Lake Deep Dive•Laying the Foundation for Claims Automation•Learn to Efficiently Test ETL Pipelines•Lessons Learned from Deidentifying 700 Million Patient Notes•Leveraging ML-Powered Analytics for Rapid Insights and Action a demonstration•Live Analytics: The next user engagement frontier•Low-Code Machine Learning on Databricks with AutoML•ML on the Lakehouse Bringing Data and ML Together to Accelerate AI Use Cases•MLOps at DoorDash•MLflow Pipelines Accelerating MLOps from Development to Production•Managing Straggler Executors at Apache Spark 3.3•Meetup Women in Data and AI•Meshing About with Databricks•Migrate Your Existing DAGs to Databricks Workflows•Migrate and Modernize your Data Platform with Confluent and Databricks•Migrating Complex SAS Processes to Databricks - Case Study•Migrating SAS to a Lakehouse on Databricks and S3•Monitoring and Quality Assurance of Complex ML Deployments via Assertions•More Context Less Chaos How Atlan and Unity Catalog Power Column-Level Lineage and Active Metadata•Mosaic: A Framework for Geospatial Analytics at Scale•Moving from Apache Spark 2 to Apache Spark 3 Spark Version Upgrade at Scale in Pinterest•Multi-Touch Attribution•Multimodal Deep Learning Applied to E-commerce Big Data•Near Real-Time Analytics with Event Streaming Live Tables and Delta Sharing•Nixtla: Deep Learning for Time Series Forecasting•Opening the Floodgates Enabling Fast Unmediated End User Access to Trillion-Row Datasets with SQL Data Warehouses•Operational Analytics: Expanding the Reach of Data in the Lakehouse Era•Optimizing Speed and Scale of User-Facing Analytics Using Apache Kafka and Pinot•Orchestration Made Easy with Databricks Workflows•OvalEdge End-To-End Data Governance•Patient Cohort Building with NLP and Knowledge Graphs•Powering Up the Business with a Lakehouse•Practical Data Governance in a Large Scale Databricks Environment•Predicting Repeat Admissions to Substance Abuse Treatment with Machine Learning•Predicting and Preventing Machine Downtime with AI and Expert Alerts•Propensity Scoring Demo•Protecting Personally Identifiable Information (PII)/PHI Data in Data Lake via Column Level Encryption•Pushing the limits of scale and performance for enterprise-wide analytics: A fire-side chat with Akamai•PySpark in Apache Spark 3.3 and Beyond•Radical Speed on the Lakehouse Photon Under the Hood•Real Time Bidding•Real Time Retail Demo•Real World Evidence and Propensity Score Matching•Real-Time Search and Recommendation at Scale Using Embeddings and Hopsworks•Real-time Risk Management with Confluent & Databricks•Realize the Promise of Streaming with the Databricks Lakehouse Platform•Recent Parquet Improvements in Apache Spark•Regulatory Reporting: Automatically translate enterprise data models into efficient data pipelines•Retail Industry Forum•Rethinking Orchestration as Reconciliation Software-Defined Assets in Dagster•Running a Low Cost Versatile Data Management Ecosystem with Apache Spark at Core•SAS Migration•Scaling AI Workloads with the Ray Ecosystem•Scaling Deep Learning on Databricks•Scaling ML at CashApp with Tecton•Scaling Salesforce In-Memory Streaming Analytics Platform for Trillion Events Per Day•Scaling Your Workloads with Databricks Serverless•Search and Aggregations Made Easy with OpenSearch and NodeJS•Serverless Kafka and Apache Spark in a Multi-Cloud Data Lakehouse Architecture•Serving Near Real-Time Features at Scale•Simplifying Migrations to Lakehouse—the Databricks Way•Sink Framework Evolution in Apache Flink•Smart Manufacturing Real-time Process Optimization with Databricks•So Fresh and So Clean: Learn How to Build Real-Time Warehouses on Lakehouse•Sound Data Engineering in Rust—From Bits to DataFrames•Spark Inception: Exploiting the Apache Spark REPL to Build Streaming Notebooks•Stadium Analytics•Streaming Data into Delta Lake with Rust and Kafka•Streaming ML Enrichment Framework Using Advanced Delta Table Features•Supercharge your SaaS applications with a modern cloud-native database•Survey of Production ML Tech Stacks•Tackling Challenges of Distributed Deep Learning with Open Source Solutions•Take Databricks Lakehouse to the Max with Informatica•Technical and Tactical Football Analysis Through Data•The Databricks Notebook Front Door of the Lakehouse•The Future is Open - a Look at Google Cloud’s Open Data Ecosystem•The Future of Data - What’s Next with Google Cloud•The Road to a Robust Data Lake Utilizing Delta Lake and Databricks to Map 150 Million Miles of Roads a Month•Tools for Assisted Apache Spark Version Migrations From 2.1 to 3.2+•Towards Dynamic Microstructure The Role of Machine Learning in the Next Generation of Exchanges•Tredence On Shelf Availability•Turbocharge your AI/ML Databricks workflows with Precisely•Turning Fan Data Into an Asset•Unifying Data Science and Business Artificial Intelligence Augmentation and Integration into Production Business Applications•What to Do When Your Job Goes OOM in the Night Flowcharts•Why a Data Lakehouse is Critical During the Manufacturing Apocalypse•You Have BI. Now What Activate Your Data•Your fastest path to Lakehouse and beyond•dbt + Machine Learning What Makes a Great Baton Pass•dbt and Databricks: Analytics Engineering on the Lakehouse