ssis 派生列

In this article, we will first give an overview on SSIS derived column transformation, then we will run an experiment to check if there is any difference between adding multiple expressions within one derived column transformation and adding a derived column transformation for each expression.

在本文中,我们将首先概述SSIS派生的列转换,然后运行一个实验,以检查在一个派生的列转换中添加多个表达式与为每个表达式添加派生的列转换之间是否有任何区别。

This article is the sixth article in the SSIS feature series which aims to remove confusion and to illustrate some of the differences between similar features provided by SQL Server Integration Services.

本文是SSIS功能系列的第六篇文章,旨在消除混淆并说明SQL Server Integration Services提供的相似功能之间的某些区别。

派生列转换 (Derived Column Transformation)

SSIS Derived Column Transformation is one of the SQL Server Integration Services that can be added within a Data Flow Task, it is used to add a new column to the data pipeline by applying SSIS expressions. The developer can choose whether to add a new derived column or to replace an existing column.

SSIS派生列转换是可以在数据流任务中添加SQL Server集成服务之一,它用于通过应用SSIS表达式向数据管道添加新列。 开发人员可以选择添加新的派生列还是替换现有列。

SSIS expressions are a combination of literals, functions, operators that yields a single data value. An expression can be composed of a single value (“abc”) or a variable (@[User::Variable1]) or a function (GETDATE()), or it can be more complex and contains some conditionals (CASE statement) or some mathematical operations or some string manipulation. To learn more about SSIS expressions you can refer to the following article from earlier in the series: SSIS Expression Tasks Vs Evaluating variables as expressions

SSIS表达式是文字,函数和运算符的组合,产生单个数据值。 表达式可以由单个值(“ abc”)或变量(@ [User :: Variable1])或函数(GETDATE())组成,也可以更复杂并且包含一些条件(CASE语句)或一些数学运算或一些字符串操作。 要了解有关SSIS表达式的更多信息,可以参考本系列前面的以下文章: SSIS表达式任务Vs将变量作为表达式求值

SSIS Derived Column Transformation can be added from the SSIS toolbox.

可以从SSIS工具箱中添加SSIS派生列转换。

The SSIS Derived Column editor is composed of 4 parts as shown in the image below:

SSIS派生列编辑器由4个部分组成,如下图所示:

  1. Columns and Variables List 列和变量列表
  2. Expressions built-in functions 表达式内置函数
  3. Derived Columns Grid 派生列网格
  4. Error Handling 错误处理

The first two parts are explained in the previous article that we mentioned above. In this section, we will describe the other parts and we will try to give some of the most known expression and use cases.

我们前面提到的上一篇文章解释了前两个部分。 在本节中,我们将描述其他部分,并尝试给出一些最知名的表达式和用例。

派生列网格 (Derived Columns Grid)

The Derived Columns Grid contains the following configurations:

派生列网格包含以下配置:

  • Derived Column Name: Specify the derived column name 派生列名称:指定派生列名称
  • Derived Column: Select between <add new column> and <replace an existing column> options 派生列:在<添加新列>和<替换现有列>选项之间选择
  • Expression: Write the SSIS expression that produce the derived column 表达式:编写产生派生列的SSIS表达式
  • Data Type: This column is read-only and it shows the data type of the expression result 数据类型:此列是只读的,它显示表达式结果的数据类型
  • Length: This column is read =-only and it shows the column length based on the expression result 长度:此列为只读,仅显示,基于表达式结果显示列长
  • Scale: This column is read-only and it automatically sets the scale for numeric data based on the data type 此列为只读列,它会根据数据类型自动设置数字数据的小数位数
  • Precision: This column is read-only and it automatically sets the precision for numeric data based on the data type 精度:此列是只读的,它会根据数据类型自动设置数字数据的精度
  • Code Page: This column can be edited and it automatically sets code page for the 代码页:可以编辑此列,它会自动为DT_STR (string) data type DT_STR (字符串)数据类型设置代码页

The Derived Columns Grid contains an empty line that is used to add a new derived column. When start filling the derived column properties within this line a new empty line is added to allow the user to add more columns.

“派生列”网格包含一个空行,用于添加新的派生列。 当开始填充此行中的派生列属性时,将添加一个新的空行,以允许用户添加更多列。

If the expression cannot be applied, its color is changed to red as shown in the image below:

如果无法应用该表达式,则其颜色将变为红色,如下图所示:

When you put the mouse cursor over the expression, a tooltip appears with the error description:

当您将鼠标指针放在表达式上时,将显示一个工具提示,其中包含错误说明:

To learn more about SSIS Derived Column Transformation you can refer to the following official documentation:

要了解有关SSIS派生列转换的更多信息,可以参考以下官方文档:

  • SSIS Derived Column Transformation SSIS派生列转换
  • Derive Column Values by Using the Derived Column Transformation 通过使用派生列转换来派生列值

错误处理 (Error Handling)

In SSIS Derived column, errors are mostly caused by expression evaluation errors, which occur if expressions that are evaluated at a run time perform invalid operations or become syntactically incorrect because of missing or incorrect data values.

在“ SSIS派生”列中,错误主要由表达式评估错误引起,如果在运行时评估的表达式执行无效操作或由于缺少或不正确的数据值而在语法上变得不正确,则会发生错误。

Error output configuration is an option that is included in most of the Data Flow Task components. In this configuration you can select how this component behaves when an evaluation error or a truncation occurs:

错误输出配置是大多数“数据流任务”组件中包含的选项。 在此配置中,您可以选择在发生评估错误或截断时该组件的行为:

  • Fail component 失败组件
  • Ignore Failure 忽略故障
  • (Error output Flow) (错误输出流)

In this article, we have considered error output configuration as a part of SSIS Derived Column Transformation since it is required to implement many popular use cases such as TRY_PARSE SQL function (will be described in the next section).

在本文中,我们考虑将错误输出配置作为SSIS派生列转换的一部分,因为它需要实现许多流行的用例,例如TRY_PARSE SQL函数(将在下一节中进行介绍)。

To learn more about Error output configuration in SSIS, you can refer to the following documentation:

要了解有关SSIS中错误输出配置的更多信息,可以参考以下文档:

  • Error Handling in Data 数据中的错误处理

示例和用例 (Examples and Use cases)

In this section, we will list some of the most popular SSIS Derived Column examples and use cases and provide an expression or an external link that contains an example:

在本节中,我们将列出一些最受欢迎的SSIS派生列示例和用例,并提供一个包含示例的表达式或外部链接:

  • Concatenating first name and last name into one column:

    将名字和姓氏连接到一个列中:

    [FirstName] + “ “ + [LastName]

    [名字] +““ + [姓氏]

  • Extract a part of a string:

    提取字符串的一部分:

    SUBSTRING([Address],1,5)

    SUBSTRING([地址],1,5)

  • Change date to integer in yyyyMMdd format:

    将日期更改为yyyyMMdd格式的整数:

    (DT_I4)((DT_WSTR,4)YEAR([DateColumn]) +

    (DT_I4)((DT_WSTR,4)YEAR([DateColumn])+

    RIGHT( “00” + (DT_WSTR,2)MONTH([DateColumn]),2) +

    RIGHT(“ 00” +(DT_WSTR,2)MONTH([DateColumn]),2)+

    RIGHT( “00” + (DT_WSTR,2)DAY([DateColumn]),2))

    RIGHT(“ 00” +(DT_WSTR,2)DAY([DateColumn]),2))

    • How do I extract file name from a path within an SSIS package? 如何从SSIS包中的路径提取文件名?
    • How to achieve TRY_CONVERT functionality in SSIS? 如何在SSIS中实现TRY_CONVERT功能?
  • Replace Nulls:

    替换空:

    REPLACENULL([Address],”Unknown”)

    REPLACENULL([地址],“未知”)

  • Remove Leading and Trailing spaces from string:

    从字符串中删除前导和尾随空格:

    LTRIM(RTRIM([Address]))

    LTRIM(RTRIM([地址]))

多表达式与多派生列转换 (Multiple Expressions Vs Multiple Derived Column Transformation)

After describing the SSIS Derived Column Transformation component, now we will try to make a comparison between adding multiple expressions into one Derived Column Transformation or adding multiple Derived Column Transformations.

在描述了SSIS派生列转换组件之后,现在我们将尝试在将多个表达式添加到一个派生列转换或添加多个派生列转换之间进行比较。

性能 (Performance )

To compare both approaches from a performance perspective we have run the following experiment; we created two packages each one contains one Data Flow Task. We have used the [Person].[Person] table from the AdventureWorks2017 database as source. And we applied the following derived columns expressions in each package:

为了从性能角度比较这两种方法,我们进行了以下实验; 我们创建了两个包,每个包包含一个数据流任务。 我们已使用AdventureWorks2017数据库中的[Person]。[Person]表作为源。 并且我们在每个包中应用了以下派生的列表达式:

  • Name:

    名称:

    [FirstName] + ” ” + [LastName]

    [名字] +”” + [姓氏]

  • Full Name:

    全名:

    [FirstName] + ” ” + [MiddleName] + ” ” + [LastName]

    [名字] +”” + [中间名] +” + [姓氏]

  • Father Full Name:

    父亲全名:

    [MiddleName] + ” ” + [LastName]

    [中间名] +”” + [姓氏]

  • NumericDate:

    数值日期:

    (DT_I4)((DT_WSTR,4)YEAR(ModifiedDate) + RIGHT(“00” + (DT_WSTR,2)MONTH(ModifiedDate),2) + RIGHT(“00” + (DT_WSTR,2)DAY(ModifiedDate),2))

    (DT_I4)((DT_WSTR,4)YEAR(ModifiedDate)+ RIGHT(“ 00” +(DT_WSTR,2)MONTH(ModifiedDate),2)+ RIGHT(“ 00” +(DT_WSTR,2)DAY(ModifiedDate),2 ))

In the first package we added only one Derived column transformation as shown in the images below:

在第一个包中,我们仅添加了一个“派生”列转换,如下图所示:

And in the second package, each expression was added in a separate SSIS derived column transformation as shown in the image below:

在第二个程序包中,每个表达式都添加到一个单独的SSIS派生列转换中,如下图所示:

After executing both packages, the results show that adding multiple expressions into one SSIS derived column transformation lasted about 02.797 seconds while adding multiple SSIS derived column transformation lasted 03.329 seconds.

执行完两个程序包后,结果表明,将多个表达式添加到一个SSIS派生的列转换中大约持续02.797秒,而添加多个SSIS派生的列转换则持续03.329秒。

The main cause was that each component has its own execution phases (Validation, Pre Execute, Execute, Post Execute) which requires more time.

主要原因是每个组件都有其自己的执行阶段(验证,执行前,执行,执行后),这需要更多时间。

相依性 (Dependency)

Note that, we can put all derived columns within one transformation component if they are independent, because if one derived column expression used another one, we must add a component for each one.

请注意,如果所有派生的列都是独立的 ,则可以将它们放在一个转换组件中 ,因为如果一个派生的列表达式使用了另一个,则我们必须为每个派生的组件添加一个组件。

错误处理 (Error Handling)

The last thing to mention is that using one Transformation component with multiple expression will make the error handling process more difficult when we need to redirect erroneous rows since the error is caused by one of the expressions and it is harder to know which expression caused the error. While when using multiple transformation components, error handling is easier.

最后要提到的是,当我们需要重定向错误的行时,将一个Transformation组件与多个表达式一起使用将使错误处理过程更加困难,因为该错误是由其中一个表达式引起的,并且更难知道哪个表达式导致了错误。 当使用多个转换组件时,错误处理更加容易。

结论 (Conclusion)

In this article, we have described the Derived Column Transformation in SSIS, and we ran an experiment to illustrate the difference between adding multiple expression within one derived column transformation and adding one derived column transformation component for each expression. The results showed that from a performance perspective using the first approach is more efficient while the second provide an easier error handling.

在本文中,我们描述了SSIS中的派生列转换,并且我们进行了一个实验来说明在一个派生列转换中添加多个表达式与为每个表达式添加一个派生列转换组件之间的区别。 结果表明,从性能的角度来看,使用第一种方法更有效,而第二种方法更易于处理错误。

目录 (Table of contents)

SSIS OLE DB Source: SQL Command vs Table or View
SSIS Expression Tasks vs Evaluating variables as expressions
SSIS OLE DB Destination vs SQL Server Destination
Execute SQL Task in SSIS: SqlStatementSource Expressions vs Variable Source Types
Execute SQL Task in SSIS: Output Parameters vs Result Sets
SSIS Derived Columns with Multiple Expressions vs Multiple Transformations
SSIS Data types: Change from the Advanced Editor vs Data Conversion Transformations
SSIS Connection Managers: OLE DB vs ODBC vs ADO.NET
SSIS Flat Files vs Raw Files
SSIS Foreach Loop vs For Loop Container
SSIS: Execute T-SQL Statement Task vs Execute SQL Task
SSIS OLE DB来源:SQL命令与表或视图
SSIS表达式任务与将变量作为表达式求值
SSIS OLE DB目标与SQL Server目标
在SSIS中执行SQL任务:SqlStatementSource表达式与可变源类型
在SSIS中执行SQL任务:输出参数与结果集
具有多个表达式与多个转换的SSIS派生列
SSIS数据类型:高级编辑器的更改与数据转换的转换
SSIS连接管理器:OLE DB与ODBC与ADO.NET
SSIS平面文件与原始文件
SSIS Foreach循环与For循环容器
SSIS:执行T-SQL语句任务与执行SQL任务

翻译自: https://www.sqlshack.com/ssis-derived-columns-with-multiple-expressions-vs-multiple-transformations/

ssis 派生列

ssis 派生列_具有多个表达式与多个转换的SSIS派生列相关推荐

  1. 将整数k转换成实数python表达式_在混合类型表达式中python自动转换intlongfloat例如.ppt...

    在混合类型表达式中python自动转换intlongfloat例如 空位:格式指示符,描述了填入的值的输出形式. %. 类型字符: 宽度:用多少位置显示数值. 省略或指定为0:根据值的实际长度显示. ...

  2. mysql将不同行数结果合并成多列_将多行合并到mysql中的一行和多列

    我正在MYSQL工作,需要将用户数据提取到一个视图中.我将使用电子邮件客户端中的数据,因此我无法在应用程序层中执行此操作. 问题是用户的每个数据字段都包含在单独的行中(这是Wordpress如何设置数 ...

  3. ssis 表达式任务_在SSIS中执行SQL任务:SqlStatementSource表达式与可变源类型

    ssis 表达式任务 In this article, I will give an overview of Execute SQL Task in SSIS and I will try to il ...

  4. SSIS数据转换组件_复制转换、数据转换、行计数

    一 复制转换 复制列转换是一种很简单的转换机制.它主要是将一列或多列地数据复制一份.然后后续的工作可以对复制的列中的数据做任何操作.而不影响被复制列的数据.下面我们接着上一篇的条件性拆分的包来介绍如何 ...

  5. pandas 取excel 中的某一列_干货Python Pandas 做数据分析之玩转 Excel 报表分析

    本篇文章选自作者在 GitChat 的分享,若有什么问题,可在公众号回复「小助手」添加小助手微信,邀请你进入技术交流群. 各位朋友大家好,非常荣幸和大家聊一聊用 Python Pandas 处理 Ex ...

  6. python输入数学表达式并求值_用Python3实现表达式求值

    一.题目描述 请用 python3编写一个计算器的控制台程序,支持加减乘除.乘方.括号.小数点,运算符优先级为括号>乘方>乘除>加减,同级别运算按照从左向右的顺序计算. 二.输入描述 ...

  7. python怎么添加列_如何将列添加到DataFrame列?添加新列详细解释————

    本文概述 我们可以使用不同的方式将新列添加到现有DataFrame中.对于演示, 首先, 我们必须编写代码以读取现有文件, 该文件由DataFrame中的某些列组成. import pandas as ...

  8. excel根据一列内容查找排序另一列_利用泰坦尼克生存数据讲述Excel数据分析的基本操作...

    一.使用Excel做数据处理方法(博主使用的是2007版本) 打开泰坦尼克生存数据文件,文件共有12列,891行. 每一列的含义是乘客ID,生存,船舱等级,名字,性别,年龄,兄弟姐妹,父母配偶,票号, ...

  9. pandas中如何选取某几列_【python】pandas中 loc amp; iloc用法及区别

    在刚学习Python的时候,对于loc.iloc.at.iat.ix有点混乱,没有进行过整理和梳理.所以针对这几种用法进行一次案例的整理.本次优先整理loc和iloc SQL中的select是根据列的 ...

最新文章

  1. 【转载】DL图片预处理
  2. poj 3984
  3. 通达信波段王指标公式主图_通达信指标公式源码超赢天下主图源码
  4. 第16天学习Java的笔记(标准类,Scanner)
  5. 一些常用的基本的liunx命令:
  6. jeecg选择按钮带入其他单据值
  7. 创建mysql windows服务_MySQL的安装及创建(windows下操作)
  8. Verilog中parameter(参数)与define(宏定义)的区别
  9. POJ 2149 ZOJ 2267UVA 1337UVALive 3192 Inherit the Spheres
  10. 软件测试论坛_进阶测试攻略——价值驱动的软件测试
  11. mysql alter auto increment_将MySQL列更改为AUTO_INCREMENT
  12. 526个儿童游戏APP应用音效游戏声音集合音效素材 Kids Game Sound Effects
  13. pdca管理循环基本主张_两个经典的循环管理法:PDCA SDCA
  14. 手机app测试方法(二)App测试-常见测试点
  15. Python——百度识图-相似图片爬虫下载解决方案
  16. Flash:任意变形工具的使用
  17. android-实现一个简单的视频弹幕
  18. 路由器 telnet配置
  19. 为什么工资高的程序员很少炫富?
  20. 一步一步学RMAN第五篇 RMAN基础知识补充 一

热门文章

  1. js怎么select 选中的值
  2. 谜题27:变幻莫测的i值
  3. flask结合令牌桶算法实现上传和下载速度限制
  4. 关于OATUH中的AUTHRAZITON CODE和TOKEN的关系,实际上就是这么回事
  5. js 把字符串格式化成时间
  6. 使用C#读取XML节点,修改XML节点
  7. IOS开发中--点击imageView上的Button没有任何反应
  8. ASP.NET程序中常用代码汇总(五)
  9. python网络虫营销_python与sem,大数据降低竞价推广成本利器
  10. ip、子网掩码、默认网关以及传输过程