ssis组件

In this article, we will talk briefly about data quality in SQL Server. Then, we will give a brief overview of the Melissa Data Quality for SQL Server Integration Services (SSIS), and we will demonstrate the components available in the community edition.

在本文中,我们将简要讨论SQL Server中的数据质量。 然后,我们将简要概述SQL Server集成服务(SSIS)的Melissa数据质量,并演示社区版中可用的组件。

介绍 (Introduction)

In general, data quality is the level of how much data fits its serving context. Enhancing the data quality is very critical since poor quality leads to inaccurate reporting and results in wrong decisions and inevitably economic damages. For this reason, each data management system provided a bunch of tools used to improve the data quality level.

通常,数据质量是多少数据适合其服务上下文的级别。 数据质量的提高非常关键,因为质量差会导致报告不准确,并导致错误的决策和不可避免的经济损失。 因此,每个数据管理系统都提供了一堆用于提高数据质量级别的工具。

For SQL Server, many technologies can be used to enhance data quality:

对于SQL Server,可以使用许多技术来提高数据质量:

SQL Server数据质量服务(DQS) (SQL Server Data Quality Services (DQS))

Data Quality Services is a knowledge-driven data quality feature developed by Microsoft and released in SQL Server 2012. It can be installed from the SQL Server installation, and it provides different services, such as building a knowledge base, data de-duplication, standardization.

数据质量服务是由Microsoft开发并在SQL Server 2012中发布的知识驱动的数据质量功能。可以从SQL Server安装中安装它,并且它提供不同的服务,例如构建知识库,重复数据删除,标准化。

To learn more about this feature, you can refer to the following articles:

要了解有关此功能的更多信息,可以参考以下文章:

  • How to clean data using Data Quality Services and SQL Server Integration Services 如何使用数据质量服务和SQL Server集成服务清除数据
  • How to use SQL Server Data Quality Services to ensure the correct aggregation of data 如何使用SQL Server数据质量服务确保正确的数据聚合
  • How to clean Master Data Services data using Data Quality Services in SQL Server 如何在SQL Server中使用数据质量服务清除主数据服务数据

使用众所周知的Microsoft SSIS组件 (Using the well-known Microsoft SSIS Components)

SQL Server Integration Services provides a bunch of components that can be used to assess and enhance data quality. These operations can be performed at the control flow level, such as data profiling and validation, or the data flow level using fuzzy lookups, conditional splits, derived columns, script component, and others.

SQL Server Integration Services提供了一堆可用于评估和增强数据质量的组件。 这些操作可以在控制流级别上执行,例如数据概要分析和验证,或者使用模糊查找 , 条件拆分 , 派生列 , 脚本组件等在数据流级别上执行。

编写SQL语句 (Writing SQL Statements)

One of the most popular data cleaning approaches is implementing your own logic using SQL statements, which is known as data wrangling. SQL Server provided a bunch of system functions that can be used to improve data quality.

最受欢迎的数据清理方法之一是使用SQL语句实现自己的逻辑,这称为数据整理。 SQL Server 提供了许多系统功能,可用于提高数据质量 。

使用第三方组件 (Using third-party components)

One of the most beautiful things in the Visual Studio IDE is that it is allowed to develop third-party components and integrate them within Microsoft products such as SSIS. Many companies developed third-party SSIS components such as CDATA, Kingwaysoft, and COZYROC.

Visual Studio IDE中最漂亮的功能之一是允许开发第三方组件并将其集成到Microsoft产品(例如SSIS)中。 许多公司开发了第三方SSIS组件,例如CDATA,Kingwaysoft和COZYROC。

Regarding data quality, one of the most popular products in the market is Melissa data quality components for SQL Server.

关于数据质量,市场上最受欢迎的产品之一是用于SQL Server的Melissa数据质量组件。

Melissa SQL Server的数据质量 (Melissa Data Quality for SQL Server)

Melissa data quality tools are a set of SSIS components that are used to clean and enrich data during the data transfer or integration process. Two editions are available:

Melissa数据质量工具是一组SSIS组件,用于在数据传输或集成过程中清理和丰富数据。 有两个版本 :

  1. Enterprise edition: Commercial, contains a wide variety of data quality components and online services
  2. 企业版 :商业 ,包含各种数据质量组件和在线服务
  3. Community edition: Free, but only a few components are available (check the link above) 社区版 :免费,但只有少数组件可用(请查看上面的链接)

In this article, we will be talking about the community edition, and we will briefly illustrate its components.

在本文中,我们将讨论社区版本,并简要说明其组成部分。

下载Melissa数据质量社区版 (Download Melissa data quality community edition)

To download the Melissa data quality community edition, you should navigate to the SQL Server editions page. Then, request a demo by filling the form located on the left side of the page. And make sure to select the community edition.

要下载Melissa数据质量社区版本,您应该导航到SQL Server版本页面 。 然后,通过填写页面左侧的表格来请求演示。 并确保选择社区版本。

After requesting the demo, you will receive an email that contains a link for the web installer with a community license key.

请求演示后,您将收到一封电子邮件,其中包含带有社区许可证密钥的Web安装程序的链接。

Now, you should download the web installer from the link you received. When finished, you should enter the Melissa license key during installation.

现在,您应该从收到的链接下载Web安装程序。 完成后,您应该在安装过程中输入Melissa许可证密钥。

When the installation is done, the Melissa data quality components should appear within the SSIS toolbox (data flow level).

安装完成后,Melissa数据质量组件应出现在SSIS工具箱(数据流级别)中。

If you add any of those components to the data flow, you will see the following notification every time you will try to open its editor.

如果将这些组件中的任何一个添加到数据流,则每次尝试打开其编辑器时都会看到以下通知。

As mentioned in the Melissa SQL Server editions page, few features are available in the community edition:

如Melissa SQL Server版本页面中所述,社区版本提供的功能很少:

  1. Contact Verify component: Only address parsing, name parsing, email correction, and phone formatting operations can be performed 联系人验证组件 :只能执行地址解析,名称解析,电子邮件更正和电话格式化操作
  2. Profiler: Max 50000 records limit 探查器 :最多50000条记录
  3. MatchUp: Max 50000 records limit 对决:最多50000条记录的限制

Note: to run examples, we exported a flat-file from AdventureWorks2017 database, using the following SQL Statement:

注意:要运行示例,我们使用以下SQL语句从AdventureWorks2017数据库导出了平面文件:

SELECT [BusinessEntityID],[Title],[FirstName],[MiddleName],[LastName],[Suffix],REPLACE(LTRIM(RTRIM(ISNULL([FirstName],'') + ' ' + ISNULL([MiddleName],'') + ' ' + ISNULL([LastName],''))),'  ',' ') as [Name],[JobTitle],[PhoneNumber],[PhoneNumberType],[EmailAddress] + '.' as [EmailAddress],[EmailPromotion],[AddressLine1],[AddressLine2],[City],[StateProvinceName],[PostalCode],[CountryRegionName]FROM [AdventureWorks2017].[HumanResources].[vEmployee]
  • Note: We concatenated first, middle and last name to test the name parser. Besides, we added a “.” at the end of the email address to test the email correction feature.

    注意: 我们将名字,中间名和姓氏连接起来以测试名称解析器。 此外,我们添加了“。” 电子邮件地址的末尾以测试电子邮件更正功能。

联系验证组件 (Contact Verify Component)

As mentioned before, there are only four features of the contact verify component available in the community edition:

如前所述,社区版中的联系人验证组件只有四个功能:

  1. Name parsing: This feature is used to split a full name filed into first, middle, and last name fields. Also, it extracts additional information as title, prefix, and suffix. In the community edition, we are able only to extract the last name 名称解析 :此功能用于将全名分为名字,中间名和姓氏字段。 此外,它还提取其他信息作为标题,前缀和后缀。 在社区版中,我们只能提取姓氏
  2. Address parsing: This feature is used to extract additional information from the address field, such as the street name, suffix, mailbox name, and others 地址解析 :此功能用于从地址字段中提取其他信息,例如街道名称,后缀,邮箱名称等
  3. Phone formatting: This feature is used to change the phone number formatting 电话格式 :此功能用于更改电话号码格式
  4. Email correction: This feature is used to remove meaningless characters from an email address 电子邮件更正:此功能用于从电子邮件地址中删除无意义的字符

To test this component, we create a new SSIS project and add the following components:

为了测试此组件,我们创建一个新的SSIS项目并添加以下组件:

  • Flat File Connection Manager: used to establish a connection with the flat file we generated from the AdventureWorks2017 database 平面文件连接管理器 :用于与我们从AdventureWorks2017数据库生成的平面文件建立连接
  • OLE DB Connection Manager: used to establish a connection with Tempdb (we will use it as destination) OLE DB连接管理器 :用于与Tempdb建立连接(我们将其用作目标)
  • Data flow task: where we will add the following components: 数据流任务:我们将在其中添加以下组件:
    • Flat File Source: read from the flat file connection manager 平面文件源:从平面文件连接管理器读取
    • MD Contact Verify: Melissa contact verify component MD联系人验证: Melissa联系人验证组件
    • OLE DB Destination: where data will be loaded OLE DB目标:数据将加载到的位置

To configure the Contact Verify component, first, we have to specify the Melissa data directory. In the Contact Verify editor, go to “File > Advanced Configuration”.

要配置联系验证组件,首先,我们必须指定Melissa数据目录。 在联系人验证编辑器中,转到“文件>高级配置”。

Make sure that the data file path is set to “C:\Program Files\Melissa DATA\SQT\Data”, which is the default data file path.

确保数据文件路径设置为“ C:\ Program Files \ Melissa DATA \ SQT \ Data”,这是默认的数据文件路径。

Now, we will first configure the name parsing feature. In the contact verify editor, we select open the “Name” tab page. Then, we should specify the input Name column and the output Last name column, as shown in the image below.

现在,我们将首先配置名称解析功能。 在联系人验证编辑器中,我们选择打开“名称”标签页。 然后,我们应该指定输入“名称”列和输出“姓氏”列,如下图所示。

Note that even first name and middle name output column are specified by default, they will not generate any data in the community edition. Besides, “Name 2” columns generate data if two names exist in the name field.

请注意,即使默认情况下也指定了名字和中间名输出列,它们在社区版中都不会生成任何数据。 此外,如果名称字段中存在两个名称,则“名称2”列将生成数据。

Next, we should select the “Address” tab page to configure address parsing. Then, we should specify all available input columns, as shown in the image below.

接下来,我们应该选择“地址”标签页来配置地址解析。 然后,我们应该指定所有可用的输入列,如下图所示。

Now, we should press on the “Additional Output Columns” button to specify the output columns generated.

现在,我们应该按“其他输出列”按钮以指定生成的输出列。

You will note that all properties related to the enterprise edition are disabled.

您将注意到与企业版相关的所有属性均被禁用。

Next, we must select the “Phone/Email” tab page to configure the phone formatting and the email correction feature. As shown below, we should specify the input phone and email columns, the output columns, and the desired phone format.

接下来,我们必须选择“电话/电子邮件”标签页来配置电话格式和电子邮件更正功能。 如下所示,我们应该指定输入电话和电子邮件列,输出列以及所需的电话格式。

Next, we must select the “Pass-Through Columns” tab page, to specify what are the columns in the input buffer we need to add to the output buffer.

接下来,我们必须选择“通过列”选项卡页面,以指定需要添加到输出缓冲区的输入缓冲区中的列。

The Contact Verify component allows adding conditional filters to the generated output, which is not supported in the community edition. You can check that in the “Output filter” tab page, where you can only change the output name.

联系人验证组件允许将条件过滤器添加到生成的输出中,社区版本不支持此条件。 您可以在“输出过滤器”选项卡页面中进行检查,在该页面中只能更改输出名称。

After configuring the MD Contact Verify component, we create a new destination table from the OLE DB destination component using by clicking on the “New” button.

配置MD联系人验证组件之后,我们通过单击“新建”按钮从OLE DB目标组件创建新的目标表。

In the end, the data flow task should look like the following:

最后,数据流任务应如下所示:

After executing the package, we can see the component impact from the result table, as shown in the following screenshots:

执行完程序包后,我们可以从结果表中看到组件的影响,如以下屏幕快照所示:

探查器组件 (Profiler component)

The second Melissa data quality component is MD Profiler. It is a data profiling component similar to the SSIS data profiling task. This component is simple; you should select the input, pass-through, and result columns. And each profile data is generated within a separate output, as shown in the screenshots below. Besides, you can perform some analysis on how the data processing is complete and save the profile to an external file.

第二个Melissa数据质量组件是MD Profiler。 它是一个数据分析组件,类似于SSIS数据分析任务。 这个组件很简单; 您应该选择输入,传递和结果列。 每个配置文件数据都在单独的输出中生成,如下面的屏幕快照所示。 此外,您可以对数据处理的完成方式进行一些分析,并将配置文件保存到外部文件中。

Even if this component is mentioned within the available features of the community edition. It still not working correctly since it may not accept a community License key.

即使在社区版的可用功能中提到了此组件,也是如此。 由于它可能不接受社区许可证密钥,因此仍然无法正常工作。

比赛组件 (Matchup component)

The third Melissa data quality free component is the Matchup component. This component is similar to the SSIS lookup transformation but with a de-duplication feature. De-duplication is performed based on match codes ruleset. In the community edition, only nine-match codes are available.

第三个Melissa数据质量免费组件是Matchup组件。 该组件类似于SSIS查找转换,但具有重复数据删除功能。 重复数据删除是根据匹配代码规则集执行的。 在社区版中,只有九个匹配的代码可用。

To perform lookups, you should add a data source and link it to the Lookup component input as shown below.

要执行查找,您应该添加一个数据源并将其链接到Lookup组件输入,如下所示。

结论 (Conclusion)

In this article, we talked briefly about data quality and how to improve it in SQL Server Integration Services (SSIS). We illustrated the Community edition of Melissa data quality and demonstrated the available components; Contact Verify was fully explained while we didn’t provide much information on Profiler and Matchup since they need a separate article. Based on the demonstration, community edition is only used for demonstration while we should buy the enterprise edition since it contains much more powerful tools that we may need at the enterprise level.

在本文中,我们简要讨论了数据质量以及如何在SQL Server Integration Services(SSIS)中提高数据质量。 我们举例说明了社区版的Melissa数据质量,并演示了可用的组件; 尽管我们没有提供有关Profiler和Matchup的大量信息,但已充分解释了Contact Verify,因为它们需要单独的文章。 基于演示,社区版仅用于演示,而我们应该购买企业版,因为它包含我们在企业级可能需要的功能更强大的工具。

翻译自: https://www.sqlshack.com/melissa-data-quality-free-components-for-ssis/

ssis组件

ssis组件_用于SSIS的Melissa Data Quality免费组件相关推荐

  1. 使用vue创建一个遮罩组件_用于输入遮罩的另一个Vue组件

    使用vue创建一个遮罩组件 语音输入蒙版 (vue-input-mask) Yet another Vue component for input masking. Based on react-in ...

  2. vue 图表组件_用于制作图表的简单轻巧的Vue组件

    vue 图表组件 纯Vue图表 (Pure Vue Chart) A simple and lightweight vue component for making charts that do no ...

  3. 自定义键盘组件_一文读懂!iOS系统组件的设计规范全解

    一说到组件,大部分初级设计师和中级设计师的脑海里只会蹦出弹窗.Toast及操作列表等具体的概念,没有一套属于自己的组件分类体系,这样对于视觉设计.交互设计或产品设计的系统学习来说都是不利的. iOS1 ...

  4. 自定义键盘组件_一文读懂!iOS系统组件的设计规范全解!

    一说到组件,大部分初级设计师和中级设计师的脑海里只会蹦出弹窗.Toast及操作列表等具体的概念,没有一套属于自己的组件分类体系,这样对于视觉设计.交互设计或产品设计的系统学习来说都是不利的. iOS1 ...

  5. ssis组件_使用SSIS Hadoop组件连接到Apache Hive和Apache Pig

    ssis组件 In our previously published articles in this series, we talked about many SSIS Hadoop compone ...

  6. java 执行ssis包_在SSIS包中使用CHECKPOINT重新启动包执行

    java 执行ssis包 In the article, SQL Server CHECKPOINT, Lazy Writer, Eager Writer and Dirty Pages in SQL ...

  7. jsf用于页面判断的标签_用于操作和导航的JSF命令组件标签

    jsf用于页面判断的标签 JSF provides two command component tags for performing action and navigation. They are ...

  8. vue图片裁剪组件_使用Vue-Rx的Vue.js图像裁剪组件

    vue图片裁剪组件 Vuejs夹 (vuejs-clipper) Vue.js image clipping components using Vue-Rx. 使用Vue-Rx的Vue.js图像裁剪组 ...

  9. phpstudy安装sg11组件_宝塔面板一键安装配置SG11加密组件教程

    宝塔面板用户安装SG11组件教程 以下教程适用于 宝塔 windows系统 6.x 以上面板 和 linux系统 7.x 以上版本 1.进入宝塔面板,点击软件管理,找到你网站对应的php如下图操作 2 ...

最新文章

  1. SAP 操作,弹出报错 - 已根据规则拒绝服务器触发的操作,是否要查看上个通讯步骤中触发的操作列表 -
  2. 802.1AE保护局域网安全
  3. python在windows的开发环境_Python合集之Python开发环境在Windows系统里面搭建
  4. Mac下php 5升级到php 7的步骤详解
  5. 使用libsvm对MNIST数据集进行实验
  6. 团队博客作业-团队个人贡献分分配规则
  7. 机器人彩铅画_彩铅画嗔
  8. MaxCompute studio与权限那些事儿 1
  9. ServicePointManager.ServerCertificateValidationCallback 冲突的解决
  10. 5G催化、VR回春,千亿行业洗牌重来
  11. UDS(ISO14229)诊断服务功能及描述完结篇
  12. 举例说明计算机在现实应用中的作用,【举例说明多媒体技术的应用】_论多媒体技术在日常工作生活中的应用...
  13. 502 Bad Gateway错误
  14. 递归实现指数型枚举(递归)
  15. 魔兽各服务器位置,魔兽世界怀旧服矿点分布在哪 全地图采矿位置一览(多图)
  16. 流畅的python学习笔记第一篇之为什么要学习这本书
  17. julia常用矩阵函数_Julia 多维数组
  18. Unity3d 录屏功能教程指南 | Cross Platform Replay Kit
  19. [转] 用小铲子挖大坑
  20. STM32开发方式及基本介绍

热门文章

  1. 服务器重装系统要注意什么_企业租用美国服务器,应该注意什么?
  2. python提供了两种基本的数值类型_Python的三种基本数据类型
  3. bzoj1562[NOI2009] 变换序列
  4. Notification小案例
  5. 5.1作业5 四则运算 测试与封装
  6. 【Android Developers Training】 20. 创建一个Fragment
  7. 计算机网络学习笔记(2. 什么是网络协议)
  8. C++实验课任务(多态--容器--算法)
  9. 为什么有人说手机很少有白色的出现了?
  10. 支付宝借呗还款利息为什么都不一样?