azure 入门

This article talks about Azure Data Catalog and how data professionals can use it to locate, understand and consume data sources.

本文讨论了Azure数据目录以及数据专业人员如何使用它来查找,理解和使用数据源。

As the name suggests, it is a service in Azure that helps users organize, discover and register data sources. This fully managed cloud service acts as a central shared place in an organization for developers, analysts, data scientists and users to contribute their knowledge and help to locate, understand and consume data.

顾名思义,它是Azure中的一项服务,可帮助用户组织,发现和注册数据源。 这种完全托管的云服务充当组织中开发人员,分析师,数据科学家和用户的中央共享场所,以贡献他们的知识并帮助查找,理解和使用数据。

Data Catalog in Azure does not move data and it remains in its existing location, a copy of its structural and descriptive metadata is added to the Data Catalog, along with a reference to the data-source location. This metadata is indexed making the data easily searchable.

Azure中的数据目录不会移动数据,而是保留在其现有位置,其结构性和描述性元数据的副本将添加到数据目录中,并附带对数据源位置的引用。 对该元数据建立索引,使数据易于搜索。

为什么我们需要一个Azure数据目录? (Why do we need an Azure Data Catalog?)

  • Companies are generating and storing boatloads of data every day, and with this fast-growing data, discovering data sources are challenging for both data producers and data consumers 公司每天都在生成和存储大量的数据,而随着数据的快速增长,发现数据源对数据生产者和数据消费者都构成了挑战。
  • It becomes highly complex and time-consuming to create and maintain documentation of large data sources 创建和维护大型数据源的文档变得非常复杂且耗时
  • tribal knowledge (information that is known within a company) that exists within an organization and it becomes little challenging for a newcomer in the company to seek all this knowledge. Azure Data Catalog rightly addresses this issue by providing a platform to gain information about the data and hence, it makes data sources easily discoverable and understandable 一定数量的部族知识 (公司内部已知的信息),并且公司中的新人寻求所有这些知识几乎没有挑战。 Azure数据目录通过提供一个平台来获取有关数据的信息来正确解决此问题,因此,它使数据源易于发现和理解
  • With Data Catalog, developers no longer have to spend time looking and searching data using complex queries 使用数据目录,开发人员不再需要花费时间使用复杂的查询来查找和搜索数据

Azure数据目录过程涉及: (Azure Data Catalog process involves:)

Below are the steps that are usually followed as we proceed in the Data Catalog:

以下是我们在数据目录中进行时通常遵循的步骤:

  1. Create a data catalog – this is the first step to provision a Data Catalog 创建数据目录–这是供应数据目录的第一步
  2. Register and annotate assets – Users can register their data sources, and also add annotations with tags, documents and understandable descriptions 注册和注释资产–用户可以注册其数据源,还可以添加带有标签,文档和易于理解的描述的注释
  3. Discover and consume assets – Users can easily search and filter assets with indexed metadata 发现和使用资产–用户可以轻松地使用索引的元数据搜索和过滤资产
  4. Connect to Data – This lets you connect and pull data into various tools like Excel, Power BI, SSDT etc. 连接到数据–这使您可以连接数据并将数据拉入各种工具,例如Excel,Power BI,SSDT等。

使用Azure数据目录时要记住的重要点 (Important points to remember while working with Azure Data Catalog)

To set up a Data Catalog, you are supposed to be the owner or co-owner of an Azure subscription.

要设置数据目录,您应该是Azure订阅的所有者或共同所有者。

Only one Data Catalog is supported per organization (i.e. per tenant) and you cannot have additional catalogs even if you have multiple subscriptions.

每个组织(即每个租户)仅支持一个数据目录,即使您有多个订阅,也无法拥有其他目录。

Data Catalog only supports work or school accounts, so in order to create a data catalog in Azure, you need to have a work or school account.

数据目录仅支持工作或学校帐户 ,因此,要在Azure中创建数据目录,您需要拥有工作或学校帐户。

Without any further delay, let’s see Azure Data Catalog in action –

无需再拖延,让我们看看运行中的Azure数据目录–

This article assumes you have basic knowledge of Azure, familiar with working with Azure SQL database and have an Azure Subscription.

本文假定您具有Azure的基本知识,熟悉使用Azure SQL数据库并具有Azure订阅 。

如何创建Azure数据目录? (How to create an Azure Data Catalog?)

You can create Data Catalog like any other Azure resource through the Azure portal. Go to the portal, search for Data Catalog, and mention a name for your data catalog. You will also have to specify the subscription name, the location for the catalog, and the pricing tier (free or standard edition). Then select Create. Finally, go to the Azure Data Catalog home page and select Publish Data.

您可以通过Azure门户像其他任何Azure资源一样创建数据目录。 转到门户网站,搜索“ 数据目录” ,并为您的数据目录命名。 您还必须指定订阅名称,目录位置和定价层(免费版或标准版)。 然后选择创建 。 最后,转到Azure数据目录主页,然后选择“ 发布数据”。

Alternatively, you can go to the Azure Data Catalog provision page, and type in Data Catalog Name, the subscription you may want to use, and the location for the catalog as shown below.

或者,可以转到“ Azure数据目录设置”页面 ,然后键入“ 数据目录名称” ,您可能要使用的订阅以及目录的位置 ,如下所示。

Scroll a little down to select the Pricing, this service is offered in two editions. For this demo, I am selecting the FREE EDITION.

向下滚动以选择Pricing ,此服务提供两个版本。 对于此演示,我选择 免费版。

I am keeping everything as default for the below categories, your ID is automatically added as a catalog user and an administrator. You can further add catalog users and catalog administrators to the catalog. And finally, click Create Catalog to create a Data Catalog named, OurSalesData in Azure.

我将以下类别的所有内容保留为默认值,您的ID将自动添加为目录用户和管理员。 您可以进一步将目录用户和目录管理员添加到目录中。 最后,单击“ 创建目录”以在Azure中创建一个名为OurSalesData的数据目录。

The Data Catalog is successfully created and you can view the same in the Azure portal as shown below. Resource group, DataCatalogs-EastUS is created automatically and the catalog resides in this. Also, if you notice, I already have SQL Server and SQL database resources created in my account.

数据目录已成功创建,您可以在Azure门户中查看数据目录,如下所示。 资源组DataCatalogs-EastUS是自动创建的,目录位于其中。 另外,如果您注意到,我已经在我的帐户中创建了SQL Server和SQL数据库资源。

Click on the Data Catalog to view properties of the catalog and you can also edit them.

单击数据目录以查看目录的属性,您也可以对其进行编辑。

启动桌面应用程序以在Azure数据目录中注册数据源 (Launch the desktop application to register your data sources in Azure Data Catalog)

Now coming back to the Data Catalog page, after clicking on Create Catalog button above, you will be taken to the below screen.

现在回到“数据目录”页面,单击上面的“创建目录”按钮后,您将进入以下屏幕。

There are two options with which you can register or publish your data sources in the Data Catalog, – Launch Application and Create Manual Entry. I personally do not prefer the “Create Manual Entry” option, as it would be a challenging and time-consuming activity for larger data sources. It is better to go with the “Launch Application” option as it is just a click-once application.

您可以使用两个选项在“数据目录”中注册或发布数据源:“启动应用程序”和“创建手动输入”。 我个人不喜欢“创建手动输入”选项,因为对于较大的数据源而言,这将是一项艰巨而耗时的活动。 最好使用“启动应用程序”选项,因为它只是一个单击一次的应用程序。

Install this application:

安装此应用程序:

Once, this application is successfully installed, you are brought in to the Sign in page. Sign-in using the same credentials that you used to access the catalog in the portal.

成功安装此应用程序后,您将进入“ 登录”页面。 使用与访问门户中的目录相同的凭据登录。

选择数据源 (Selecting a data source )

Let’s head over to select a data source in order to register it in your Data Catalog.

让我们先选择一个数据源,以便将其注册到您的数据目录中。

You can register tons of data sources like SQL Server, Reporting Services, HDFS, Hive, HANA database, Azure Data Lake Analytics etc. as shown below in the Data Catalog. Since I already have a SQL database in my account, I will go with SQL Server as the data source. Click on SQL Server and select NEXT.

您可以注册大量数据源,例如SQL Server,Reporting Services,HDFS,Hive,HANA数据库,Azure Data Lake Analytics等,如下数据目录中所示。 由于我的帐户中已经有一个SQL数据库,因此我将选择SQL Server作为数据源。 单击SQL Server并选择NEXT

Provide SQL Server Name, the authentication Type, and also the database (mysqldb, in this case) that you want to register and click CONNECT.

提供SQL Server名称,身份验证类型以及要注册的数据库(在这种情况下为mysqldb),然后单击CONNECT。

在Azure数据目录中注册数据源 (Register a data source in Azure Data Catalog)

Expand your database and select SalesLT, you will be provided with all the objects under Available objects that you want to register in your data catalog. I have selected all of them using a double right arrow (>>). Also, click on Include Preview option to preview sample data later.

展开数据库并选择SalesLT,将为您提供要在数据目录中注册的“可用对象”下的所有对象。 我使用向右双箭头(>>)选择了所有这些对象。 另外,单击“ 包括预览”选项以稍后预览样本数据。

The registration of objects has been done and you can also register more objects using ‘register more objects’ option. For now, let’s click on VIEW PORTAL to discover our data.

对象的注册已完成,您也可以使用“注册更多对象”选项注册更多对象。 现在,让我们单击“查看门户”以发现我们的数据。

如何发现和注释Azure数据目录中的数据源 (How to discover and annotate data sources in an Azure Data Catalog)

Suppose that we want to look for the information related to any order in the database, for this, you can type ‘order’ in the search bar and you will find two SQL Server tables related to orders.

假设我们要在数据库中查找与任何订单相关的信息,为此,您可以在搜索栏中键入“ order”,您将找到两个与订单相关SQL Server表。

You can further annotate this data asset by providing a friendly name (I have typed in OrdersIn2020 as a friendly name), some description, who is the expert, etc. in the Properties tab as shown below.

您可以通过在“ 属性”选项卡中提供一个友好名称(我在OrdersIn2020中输入的友好名称),一些描述,谁是专家等来进一步注释此数据资产,如下所示。

Click on the Preview icon to view a sample of the data it contains.

单击预览图标以查看其中包含的数据的示例。

We can also add meaningful descriptions and tags to all the columns present in the table in the Columns tab. This will not only help us know where the attribute is located but also depicts what this data attribute is all about.

我们还可以向“ 列”选项卡中表中存在的所有列添加有意义的描述和标记。 这不仅可以帮助我们知道属性的位置,还可以描述此数据属性的全部含义。

At times, tags and descriptions are not enough to provide a clear understanding of the data asset. To make it more understandable for data consumers, you can add documentation related to this data asset in the Documentation tab as shown below. This will help provide a complete and detailed explanation of data assets.

有时,标签和描述不足以提供对数据资产的清晰理解。 为了使数据使用者更容易理解,可以在“ 文档”选项卡中添加与此数据资产相关的文档 ,如下所示。 这将有助于提供对数据资产的完整而详细的解释。

如何连接到Azure数据目录中的数据源 (How to connect to data sources in an Azure Data Catalog)

Once we are done registering, locating and annotating data, we can also connect to the data source using Data Catalog service. This service offers multiple options to connect to a data source. You can do so by clicking the ‘Open In …’ icon in the horizontal tile. You will find, we can connect our data source to Excel, SSDT and Power BI.

完成数据的注册,定位和注释后,我们还可以使用数据目录服务连接到数据源。 该服务提供了多个选项以连接到数据源。 您可以通过点击水平磁贴中的“ 打开方式... ”图标来实现。 您会发现,我们可以将数据源连接到Excel,SSDT和Power BI。

To connect this data source in Power BI Desktop (provided Power BI Desktop is installed on the client computer), click the Power BI Desktop option from the contextual menu.

要在Power BI Desktop中连接此数据源(客户端计算机上已安装了Power BI Desktop),请从上下文菜单中单击Power BI Desktop选项。

Data users can now view, analyze and visualize their data in the Power BI Desktop app as shown below.

数据用户现在可以在Power BI Desktop应用程序中查看,分析和可视化其数据,如下所示。

You can also go over this Microsoft documentation, to know more about Data Catalog service in Azure.

您也可以浏览此Microsoft文档 ,以了解有关Azure中数据目录服务的更多信息。

结论 (Conclusion)

We discussed important facts about Azure Data Catalog in this short article. Along the way, we also saw how this tool makes the lives of users easier by discovering, understanding and consuming data sources. If you have any questions, please feel free to ask in the comments section below.

在这篇简短的文章中,我们讨论了有关Azure数据目录的重要事实。 在此过程中,我们还看到了该工具如何通过发现,理解和使用数据源使用户的生活更轻松。 如果您有任何疑问,请随时在下面的评论部分中提问。

翻译自: https://www.sqlshack.com/getting-started-with-azure-data-catalog/

azure 入门

azure 入门_Azure数据目录入门相关推荐

  1. azure 入门_Azure Databricks入门指南

    azure 入门 This article serves as a complete guide to Azure Databricks for the beginners. Here, you wi ...

  2. 无责任Windows Azure SDK .NET开发入门(二):使用Azure AD 进行身份验证

    <編者按>本篇为系列文章,带领读者轻松进入Windows Azure SDK .NET开发平台.本文为第二篇,将教导读者使用Azure AD进行身分验证.也推荐读者阅读无责任Windows ...

  3. Azure Event Hub完全入门指南

    转需:https://www.cnblogs.com/mysunnytime/p/11634815.html Event Hub事件中心 本文的目的在于用最白的大白话,让你从"完全不懂&qu ...

  4. Python从入门到精通 - 入门篇 (下)

    上一讲回顾:Python从入门到精通 - 入门篇 (上) 接着上篇继续后面两个章节,函数和解析式. 4 函数 Python 里函数太重要了 (说的好像在别的语言中函数不重要似的).函数的通用好处就不用 ...

  5. python快速编程入门课后简答题答案-编程python入门 编程python入门课后习题

    编程python入门 编程python入门课后习题 米粒妈咪课堂小编整理了填空.选择.判断等一些课后习题答案,供大家参考学习. 第一章 一.填空题 Python是一种面向对象的高级语言. Python ...

  6. 半小时入门MATLAB编程入门基础知识:

    https://learnxinyminutes.com/docs/zh-cn/matlab-cn/ 半小时入门MATLAB编程入门基础知识: % 以百分号作为注释符 %{ 多行注释 可以 这样 表示 ...

  7. python编程入门指南-编程入门指南

    编程入门指南 ----------------------------------------------- 编程入门指南 v1.5 --- https://zhuanlan.zhihu.com/p/ ...

  8. flink入门_Flink从入门到放弃-入门篇

    大数据成神之路: 点我去成神之路系列目录^_^ Java高级特性增强-集合 Java高级特性增强-多线程 Java高级特性增强-Synchronized Java高级特性增强-volatile Jav ...

  9. Apache NIFI入门(读完即入门)

    Apache NIFI入门(读完即入门) 编辑人(全网同名):酷酷的诚 邮箱:zhangchengk@foxmail.com 我将在本文中介绍: 什么是ApacheNIFI,应在什么情况下使用它,理解 ...

最新文章

  1. 小米node2红外_使用python-miio控制小米智能插座
  2. linux c 内存操作函数 简介
  3. html校验长度为9位,2018记一次前端面试笔试考题一
  4. linux0775权限,Linux权限管理
  5. Git使用技巧(3)-- 远程操作
  6. NLP高阶实战必读:一文走遍完整自然语言处理流程
  7. STM32F407VG uCOS-II2.91 IAR工程 以及uCOS使用库编译的方法
  8. Hadoop YARN:调度性能优化实践
  9. LNMP1.4环境中安装fileinfo插件
  10. mysql 5.7多层级json查询_MySql5.7 json查询
  11. 模型调参(AutoML)— optuna
  12. SpringBoot---Tomcat日志配置
  13. python查成绩_方正教务处自动抢课查成绩(python版)(一)
  14. 使用RamDiskNT虚拟软盘后vmware无法识别
  15. 用Andriod studio学习制作APP
  16. 解决fatal error C1060: 编译器的堆空间不足(详解)
  17. PC版微信加密图片解密思路与代码实现_Python
  18. 天津化工杂志天津化工杂志社天津化工编辑部2022年第3期目录
  19. C语言是一种怎样的语言,零基础学习C语言难不难?
  20. 服务网格——什么是服务网格?(概念原理1)

热门文章

  1. 2019安全渗透类工具合集
  2. 渐变,类Flash的菜单
  3. Android动态添加Fragment
  4. 华南理工大学2016年数学分析高等代数考研试题参考解答
  5. android 动态壁纸开发
  6. js 中exec、test、match、search、replace、split用法
  7. Asp.Net访问Oracle 数据库 执行SQL语句和调用存储过程
  8. promise实现红绿灯
  9. leetcode专题训练笔记
  10. 保险未起保是投保成功了吗?