sql etl

介绍 (Introduction)

An enterprise data warehouse ETL solution typically includes, amongst other steps, a data transformation step that converts source data from one data type into another. It is during this step that type conversion errors may occur and depending on the type of exception handling techniques implemented in the ETL solution (or lack thereof), frustration may occur for both ETL developers and DBAs when trying to identify and resolve type conversion errors. In this article we take a look at a trio of SQL TRY built-in functions that were introduced in SQL Server 2012, namely, TRY_PARSE, TRY_CAST, and TRY_CONVERT and how they could be utilized to reduce type conversion errors in ETL solutions and thereby saving developers needless troubleshooting exercise.

企业数据仓库ETL解决方案除其他步骤外,通常还包括一个数据转换步骤,该步骤将源数据从一种数据类型转换为另一种数据类型。 在此步骤中,可能会发生类型转换错误,并且取决于ETL解决方案中实现的异常处理技术的类型(或缺少此类错误处理技术),当尝试识别和解决类型转换错误时,ETL开发人员和DBA都可能会感到沮丧。 在本文中,我们介绍了SQL Server 2012中引入的三个SQL TRY内置函数,即TRY_PARSETRY_CASTTRY_CONVERT,以及如何利用它们来减少ETL解决方案中的类型转换错误并从而节省它们。开发人员无需进行故障排除练习。

挑战 (Challenge)

Likewise, the benefits of the SQL TRY functions TRY_PARSE, TRY_CAST, and TRY_CONVERT functions in an ETL solution are better realised by firstly demonstrating the existing limitations of their PARSE, CAST and CONVERT counterparts. To demonstrate the existing limitations, we make use of a football (soccer) related analogy wherein we use data from Table 1 to populate a Nominees dimension that will store the top 3 nominees for the 2016 FIFA Ballon d’Or.

同样,通过首先演示它们的PARSECASTCONVERT对应项的现有限制,可以更好地实现ETL解决方案中SQL TRY函数TRY_PARSETRY_CASTTRY_CONVERT函数的好处。 为了证明现有的局限性,我们使用与足球(足球)相关的类比,其中我们使用表1中的数据来填充“ 提名人”维度,该维度将存储2016年FIFA Ballon d'Or的前3名被提名人。

Nominee Club Jersey Number Votes Date of Birth Place of Birth Nationality Height
Antoine Griezmann Atletico Madrid 7 198 21 March 1991 Mâcon France 1.75
Lionel Messi FC Barcelona 10 316 24 June 1987 Rosario Argentina 1.70
Cristiano Ronaldo Real Madrid 7 745 05 February 1985 Funchal Portugal 1.85
被提名人 俱乐部 球衣号码 投票数 出生日期 出生地 国籍 高度
安托万·格里兹曼 马德里竞技 7 198 1991年3月21日 马Kong 法国 1.75
莱昂内尔·梅西 巴塞罗那足球俱乐部 10 316 1987年6月24日 罗萨里奥 阿根廷 1.70
克里斯蒂亚诺·罗纳尔多 皇家马德里 7 745 1985年2月5日 丰沙尔 葡萄牙 1.85

The steps that will be followed to load the Nominees dimension will be according to the ETL acronym which is Extract Transform and Load. During the Extract step, we would import data from Table 1 into a staging table object with all fields defined as variable characters (VARCHAR). The next step – Transform – involves type conversion of fields from our staging table into other data types. For the purposes of this demo, this step converts the data type of the fields highlighted in Table 2.

将要遵循加载代理人尺寸的步骤将被根据ETL的缩写,其为E XTRACTŤransformL OAD。 期间对E XTRACT步骤中,我们将导入从表1的数据与定义为可变字符(VARCHAR)的所有字段一个临时表对象。 下一步骤- T的 ransform -涉及从我们的临时表的字段的其它数据类型的类型转换。 就本演示而言,此步骤将转换表2中突出显示的字段的数据类型。

Unlike the Extract step, the Transform step can be implemented in several ways including using a Data Flow Task to convert data between Source and Destination components; using the Data Conversion transformation task to convert data types; or by simply using a T-SQL script in a form of a SQL Server stored procedure or view that employs SQL CONVERT and SQL CAST functions to perform type conversion. Figure 1 indicates how Data Conversion transformation task in SSIS can be used to convert data as per the requirement in Table 2.

不同于对E XTRACT步骤中,T ransform步骤可以以多种方式,包括使用数据流任务目标组件之间的转换数据来实现; 使用数据转换转换任务转换数据类型; 或者仅通过使用采用SQL CONVERT和SQL CAST函数执行类型转换SQL Server存储过程或视图形式的T-SQL脚本。 图1指示了如何按照表2的要求使用SSIS中的数据转换转换任务来转换数据。

On the other hand, Script 1 demonstrates a T-SQL equivalent option for performing type conversions.

另一方面, 脚本1演示了T-SQL等效选项,用于执行类型转换。


SELECT [Player Name],[Club],CONVERT(INT, [Jersey Number]) [Jersey Number],CAST([Date of Birth] AS DATE) [Date of Birth],[Place of Birth],[Country of Birth],PARSE([Height] AS NUMERIC(3,2)) [Height],[Marital Status]
FROM [SQLSHACK].[dbo].[STG_FIFABallonDOr]

Script 1

脚本1

The challenge with the traditional T-SQL type conversion function occurs whenever a type conversion is unsuccessful due to an invalid input data format. For instance, say we alter values in the [Date of Birth] staging column to add a suffix “_SQLSHack” as shown in Figure 2.

每当由于无效的输入数据格式而导致类型转换失败时,传统的T-SQL类型转换功能就会面临挑战。 例如,假设我们更改[出生日期]登台列中的值以添加后缀“ _SQLSHack”, 如图2所示。

Such a change to the [Date of Birth] staging column causes both the SSIS and T-SQL Transform steps to break as per the error messages shown below:

这样的改变到分段列中的[出生日期]使两者SSIS和T-SQLŤransform步骤,打破按照如下所示的错误消息:

Msg 241, Level 16, State 1, Procedure sp_loadDim, Line 3 [Batch Start Line 21]
Conversion failed when converting date and/or time from character string.

消息241,级别16,状态1,过程sp_loadDim,第3行[批处理开始第21行]
从字符串转换日期和/或时间时转换失败。

As a result of the above type conversion errors, the entire ETL solution is likely to break, which is sometimes not an ideal situation, particularly if you only have one window in a day to run your ETLs. It would mean that a mere data type conversion error could result in you having to wait for another day to get an opportunity to run your ETLs again, which could be very inconvenient not just for you but for your business users too.

由于上述类型转换错误,整个ETL解决方案很可能会中断,这有时不是理想的情况,尤其是如果一天中只有一个窗口来运行ETL时。 这意味着仅数据类型转换错误可能导致您不得不等待另一天才能有机会再次运行ETL,这不仅对您而且对您的业务用户来说都是非常不便的。

解 (Solution)

Whilst there could be errors in your ETL that are unpredictable thus unavoidable, you could avoid type conversion errors by redirecting all errors into another destination in your Data Flow Task in SSIS. You could also refactor your T-SQL type conversion script to make use of ISDATE and ISNUMERIC functions in order to firstly check whether what you are attempting to convert is in fact of expected data type. However, in this article, I would like to further propose another simpler approach to avoiding type conversion errors which is to make use of the SQL TRY functions such as TRY_PARSE, TRY_CAST, and TRY_CONVERT functions.

尽管您的ETL中可能存在无法预测的错误,因此不可避免,但是您可以通过将所有错误重定向到SSIS中的数据流任务中的另一个目标来避免类型转换错误。 您还可以重构T-SQL类型转换脚本以使用ISDATEISNUMERIC函数,以便首先检查您尝试转换的内容是否实际上是预期的数据类型。 但是,在本文中,我想进一步提出另一种避免类型转换错误的简单方法,即利用SQL TRY函数(例如TRY_PARSETRY_CASTTRY_CONVERT函数)。

Script 2 shows an updated version of Script 1 which replaces the traditional type conversion T-SQL functions with SQL TRY functions TRY_PARSE, TRY_CAST, and TRY_CONVERT functions. Thus, instead of returning an error message shown in Figure 3, a NULL value is returned for all [Date of Birth] data that couldn’t be successfully converted, as shown in Figure 4. This is to indicate that an attempt to convert these values to date were unsuccessful, but instead of throwing an error message, a NULL was returned. This is in essence the benefit of these Try functions, instead of throwing errors, they all return a NULL for the values they couldn’t successfully convert.

脚本2显示了脚本1的更新版本,该版本将传统的类型转换T-SQL函数替换为SQL TRY函数TRY_PARSETRY_CASTTRY_CONVERT函数。 因此,不是返回图3所示的错误消息,而是为所有无法成功转换的[生日]数据返回NULL值, 如图4所示。 这表明尝试将这些值转换为日期是失败的,但是没有抛出错误消息,而是返回了NULL。 从本质上讲,这是这些Try函数的好处,它们不会抛出错误,而是为无法成功转换的值都返回NULL。


SELECT [Player Name],[Club],TRY_CONVERT( INT, [Jersey Number]) [Jersey Number],TRY_CAST([Date of Birth] AS    DATE) [Date of Birth],[Place of Birth],[Country of Birth],TRY_PARSE([Height] AS   NUMERIC(3, 2)) [Height],[Marital Status]
FROM [SQLSHACK].[dbo].[STG_FIFABallonDOr];

Script 2

剧本2

In terms of arguments, both TRY_PARSE and TRY_CONVERT functions require the same number of mandatory and optional parameters as was required when using PARSE and CONVERT functions, respectively. However, compared to the CAST function, the TRY_CAST requires two additional parameters. In terms of performance, the TRY_PARSE function may incur additional SQL Server performance overhead as it is not a native SQL Native function. It also doesn’t support type conversion to XML, TEXT and VARBINARY data types.

就参数而言, TRY_PARSETRY_CONVERT函数分别需要与使用PARSECONVERT函数时相同数量的必需和可选参数。 但是,与CAST函数相比, TRY_CAST需要两个附加参数。 在性能方面,由于TRY_PARSE函数不是本机SQL Native函数,因此可能会产生额外SQL Server性能开销。 它还不支持将类型转换为XMLTEXTVARBINARY数据类型。

However, one advantage that SQL TRY function TRY_PARSE has over TRY_CAST and TRY_CONVERT is that it can successfully perform a type conversion whilst TRY_CAST and TRY_CONVERT would have returned a NULL value for the very same input. For instance, according to timeanddate.com, Lionel Messi’s date of birth (1987-06-24) was on a Wednesday. Now, say we alter the date format of his [Date of Birth] staging value into Wednesday, 24 June 1987 and update our T-SQL type conversion script as per Script 3, we get an output shown in Figure 5, indicating that only the SQL TRY function TRY_PARSE managed to successfully convert Messi’s date of birth.

但是,SQL TRY函数TRY_PARSE优于TRY_CASTTRY_CONVERT的一个优点是,它可以成功执行类型转换,而TRY_CASTTRY_CONVERT对于相同的输入将返回NULL值。 例如,根据timeanddate.com ,莱昂内尔·梅西(Lionel Messi)的出生日期( 1987-06-24 )是在星期三。 现在,假设我们将其[出生日期]登台值的日期格式更改为1987年6月24日星期三,并按照脚本3更新了T-SQL类型转换脚本,我们得到如图5所示的输出,指示仅SQL TRY函数TRY_PARSE成功地转换了梅西的出生日期。


SELECT [Player Name],[Club],TRY_CAST([Date of Birth] AS    DATE) [Date of Birth],TRY_CONVERT( DATE, [Date of Birth]) [Date of Birth],TRY_PARSE([Date of Birth] AS    DATE) [Date of Birth]
FROM [SQLSHACK].[dbo].[STG_FIFABallonDOr];

Script 3

脚本3

Another thing to look out for when using these new functions is that, if your ETL involves connections to SQL Azure, then you will have to replace SQL TRY function TRY_CONVERT with either TRY_PARSE or TRY_CAST as TRY_CONVERT is not supported in Azure. Finally, although the biggest advantage of TRY_PARSE or TRY_CAST as TRY_CONVERT functions over their PARSE, CAST and CONVERT counterparts is that whenever they cannot convert a value they return NULL values instead of throwing an error, this is not always true, particularly when performing explicit type conversions using TRY_CAST and SQL TRY function TRY_CONVERT functions. For instance, the execution of Script 4 causes the following error:

使用这些新功能时要注意的另一件事是,如果您的ETL涉及到SQL Azure的连接,那么您将不得不用TRY_PARSETRY_CAST替换SQL TRY函数TRY_CONVERT ,因为Azure不支持TRY_CONVERT 。 最后,尽管TRY_PARSETRY_CAST作为TRY_CONVERT函数相对于PARSECASTCONVERT对应函数的最大优点是,每当它们无法转换值时,它们都将返回NULL值而不是抛出错误,但这并不总是正确的,尤其是在执行显式类型时使用TRY_CAST和SQL TRY函数的TRY_CONVERT函数进行转换。 例如,执行脚本4会导致以下错误:

Msg 529, Level 16, State 2, Line 2
Explicit conversion from data type int to xml is not allowed.

消息529,第16级,州2,第2行
不允许将数据类型从int显式转换为xml。


SELECT TRY_CAST(99 AS    XML),TRY_CONVERT( XML, 99);

Script 4

脚本4

摘要 (Summary)

There are failures during ETL loads that shouldn’t prevent the rest of the ETL from executing and type conversion error is one such failure that should be avoided whenever you can. In this article we have demonstrated that SQL TRY function TRY_PARSE or TRY_CAST as TRY_CONVERT functions make it easy for ETL developers to avoid type conversion errors that could break your entire ETL run.

在ETL加载期间发生的故障不应阻止ETL的其余部分执行,并且类型转换错误就是其中之一,应尽可能避免这种故障。 在本文中,我们证明了SQL TRY函数TRY_PARSETRY_CAST作为TRY_CONVERT函数使ETL开发人员可以轻松避免可能导致整个ETL运行中断的类型转换错误。

翻译自: https://www.sqlshack.com/etl-optimization-using-sql-server-try-functions/

sql etl

sql etl_使用SQL TRY函数进行ETL优化相关推荐

  1. php用于防SQL注入的几个函数

    用于防SQL注入的几个函数 不要相信用户的在登陆中输入的内容,需要对用户的输入进行处理 SQL注入: ' or 1=1 # 防止SQL注入的几个函数: addslashes($string):用反斜线 ...

  2. sql常用语法命令及函数_SQL右连接命令:语法示例

    sql常用语法命令及函数 For this guide we'll discuss the SQL RIGHT JOIN. 对于本指南,我们将讨论SQL RIGHT JOIN. 正确加入 (Right ...

  3. concat mysql sql注入_Mysql中用concat函数执行SQL注入查询的方法

    Mysql数据库使用concat函数执行SQL注入查询 SQL注入语句有时候会使用替换查询技术,就是让原有的查询语句查不到结果出错,而让自己构造的查询语句执行,并把执行结果代替原有查询语句查询结果显示 ...

  4. [MSSQL]也说SQL中显示星期几函数

    网上盛传着三个版本,分别来看下 版本1 http://bernardstudios.com/select-day-of-week-name-using-t-sql/ SELECT CASE (DATE ...

  5. 把Python函数转换成能在SQL语句中调用的函数

    感谢中国传媒大学胡凤国老师提供的案例和第一版代码! 问题描述:把Python函数转换为能在SQLite数据库SQL语句中调用的函数,这样可以大幅度扩展SQL语句的功能. 演示代码: 运行结果: 今天公 ...

  6. json函数 presto_Hive sql和Presto sql的一些对比

    最近由于工作上和生活上的一些事儿好久没来博客园了,但是写博客的习惯还是得坚持,新的一年需要更加努力,困知勉行,终身学习,每天都保持空杯心态.废话不说,写一些最近使用到的Presto SQL和Hive ...

  7. sql游标 while_用SQL Server中的排名函数替换SQL While循环和游标,以提高查询性能

    sql游标 while SQL While loop and cursor are the most common approach to repeat a statement on conditio ...

  8. SQL Server中的STRING_SPLIT函数

    This article will cover the STRING_SPLIT function in SQL Server including an overview and detailed u ...

  9. 学习SQL:SQL Server日期和时间函数

    So far, we haven't talked about SQL Server date and time functions. Today we'll change that. We'll t ...

最新文章

  1. taskset -pc PID 查看线程占用cpu核
  2. 理解正向代理与反向代理的区别
  3. percona mysql5.7进程出现大量unauthenticated user解决记录
  4. C语言switch分支结构
  5. linux查看进程占用pcu,Linux运维:如何使用ss代替netstat命令
  6. 一个进程可以创建多少线程?
  7. C++学习之路 | PTA乙级—— 1083 是否存在相等的差 (20 分)(精简)
  8. 第37课 神奇的大自然 《小学生C++趣味编程》
  9. 在Python中什么是slicing?
  10. 静态代码块的执行顺序
  11. HTTP权威指南(浓缩版)
  12. lora三层服务器协议,LoRa以及LoRa包含的几种协议
  13. linux 123端口,关闭123端口和1900端口的方法
  14. 抢滩新零售混战 实力战将才不惧双十一 附:双十一红包雨时间表
  15. GitHub中国区前100名到底是什么样的人
  16. html a标签属性 rel=‘nofollow‘
  17. JS中的运算符号(加号)
  18. 公安销售许可证的申请流程-从检测到拿证
  19. 破解excel工作表保护
  20. 多目标优化拥挤距离计算

热门文章

  1. 秋风下的萧瑟 NOIP2018 游记
  2. 基础排序算法···1
  3. Retrofit2源码分析(一)
  4. 老李分享:持续集成学好jenkins之Git和Maven配置
  5. javascript 技巧总结积累1-108条(正在积累中)
  6. fatal error C1010: unexpected end of file while looking for precompiled header directive
  7. 如何判断一个变量是数组还是对象
  8. LeetCode(700)——二叉搜索树中的搜索(JavaScript)
  9. 智能手机上最没有用的功能是什么?
  10. 你小时候家里最穷的时候有多穷?