ssis for循环容器

One positive thing to come out of my recent project that involved rewriting one of the Data Marts from our Data Warehouse environment was a confirmation of my suspicions with regards to the behavior of SQL Server Integration Services’ (SSIS) ForEach Loop Container. You see, I have long suspected that the ForEach File Enumerator type in SSIS’s ForEach Loop Container does not process time stamped text files in an order that could be deemed correct to the human eye. For instance, Figure 1 shows a list of text files containing data relating to Marital Statuses of FIFA 2016 Ballon D’Or nominees.

我最近的一个项目涉及从我们的数据仓库环境中重写一个数据市场,这是一个积极的事情,这证实了我对SQL Server Integration Services(SSIS) ForEach循环容器的行为的怀疑。 您会发现,我长期以来一直怀疑SSIS的ForEach循环容器中的ForEach File Enumerator类型不会以人眼认为正确的顺序处理带时间戳的文本文件。 例如, 图1显示了一个文本文件列表,其中包含与FIFA 2016 Ballon D'Or提名人的婚姻状况有关的数据。

The data contained in the files created in the morning of June 30th (suffixed with “AM”) is similar – with Lionel Messi’s marital status set to Single as shown in Figure 2.

6月30 早上创建的文件中包含的数据(后缀为“ AM”)是相似的– Lionel Messi的婚姻状况设置为“ 单身 ”, 如图2所示。

Later on, that day, Lionel Messi got married and as a result the “2PM” file contains changes to Leonel Messi’s marital status as shown in Figure 3.

那天晚些时候,Lionel Messi结婚了,因此“ 2PM”文件包含了Leonel Messi婚姻状况的更改, 如图3所示。

It is interesting to note that the default sort order of these text files in Windows Explorer is by file name – which looks to be incorrect as the file suffixed with “2PM” is listed ahead of the “7AM” file. This means that if we were to load data from these files into a Type 2 Marital Status dimension, the latest version of the data would come from the “7AM” file.

有趣的是, Windows资源管理器中这些文本文件的默认排序顺序是按文件名排序–由于在“ 7AM”文件之前列出了带有“ 2PM”后缀的文件,因此看起来不正确。 这意味着,如果我们要将这些文件中的数据加载到2型婚姻状况维度中,则数据的最新版本将来自“ 7AM”文件。

The correct order of processing these text files is to have them sorted by date modified as shown in Figure 4 wherein the “2PM” file will be the last one to be imported.

处理这些文本文件的正确顺序是按修改的日期对它们进行排序, 如图4所示,其中“ 2PM”文件将是最后一个要导入的文件。

按文件名处理文本文件 (Processing of Text Files by File Name)

The screenshot in Figures 1 and 4, indicates that listing of the file name can appear differently depending on whether you are sorting by file name or date modified. Whilst Windows Explorer shows that the listing of files can be sorted in multiple ways, it looks like the ForEach File Enumerator type only processes text files in an order sorted by file name which – as we had indicated in Figure 1 – is incorrect. To demonstrate this, we make use of a sample SSIS package shown in Figure 5. The package begins by using an Execute SQL Task to clear the staging table. The next step involves using a Data Flow Task inside a ForEach loop container that iteratively loads the text files.

图14中的屏幕快照表明,根据您是按文件名排序还是修改日期排序,文件名列表的显示方式可能会有所不同。 尽管Windows资源管理器显示了可以用多种方式对文件列表进行排序,但是看起来ForEach File Enumerator类型仅按按文件名排序的顺序处理文本文件( 如图1所示 )是不正确的。 为了证明这一点,我们使用了图5所示的示例SSIS包。 该程序包首先使用Execute SQL Task清除登台表。 下一步涉及在迭代加载文本文件的ForEach循环容器内使用数据流任务

As shown in Figure 6, ForEach Loop Container is configured to use ForEach File Enumerator type and it processes files with file name like MaritalStatus_FIFA*.

如图6所示, ForEach循环容器被配置为使用ForEach File Enumerator类型,并且它处理文件名如MaritalStatus_FIFA *的文件。

Following the successful execution of the SSIS package shown in Figure 5, we are able to view all data that was imported into the staging table as shown in Figure 7. As already predicted, the ForEach loop container using ForEach File Enumerator type processed the files in a file name order. This is incorrect as the latest record for Lionel Messi (at line 5 in Figure 7) is loaded ahead of the 7AM file.

成功执行图5所示的SSIS包之后,我们就可以查看导入到登台表中的所有数据, 如图7所示。 如前所述,使用ForEach File Enumerator类型的ForEach循环容器按文件名顺序处理文件。 这是不正确的,因为Lionel Messi的最新记录( 图7中的第5行)已加载到7AM文件之前。

通过文件创建时间处理文本文件 (Processing of Text Files by File Creation Time)

The dangers of relying on the ForEach File Enumerator type is that we don’t have control in the way files are processed. We can get around this limitation in two ways:

依靠ForEach File Enumerator类型的危险在于,我们无法控制文件的处理方式。 我们可以通过两种方式解决此限制:

  1. Renaming Text Files to Military Time

    将文本文件重命名为军事时间

    The simplest way of getting your time stamped text files processed in the correct order, is to adopt a file naming convention that uses military time instead of the standard hour clock. As it can be seen in Figure 8, renaming of the hour clock part of the file names to military time has resulted into the files listed in the correct order in the Windows Explorer.

    以正确的顺序处理带时间戳的文本文件的最简单方法是采用文件命名约定,该约定使用军用时间而不是标准小时钟。 如图8所示 ,将文件名的小时部分重命名为军用时间已导致Windows资源管理器中以正确的顺序列出文件。

    Figure 8: Text files with Military Time 图8:带有军事时间的文本文件

    Following the SSIS package execution, the data in our staging table is updated as shown in Figure 9. As it can be seen, the processed text file contains accurate marital status for Lionel Messi.

    在执行SSIS包之后,登台表中的数据将更新, 如图9所示。 可以看出,处理后的文本文件包含Lionel Messi的准确婚姻状况。

    Figure 9: Data in Staging Table in the correct order 图9:登台表中的数据以正确的顺序

    One significant limitation of the military time approach is that as SSIS developers we often don’t have control in terms of naming the text files. I recall several instances whereby my SSIS solution processed files that were prepared and dumped to an FTP location by a legacy 3rd party program. In such instances, you are usually given read permission on the FTP location and thereby prevented from editing the files.

    军用时间方法的一个重大局限性是,作为SSIS开发人员,我们常常无法控制文本文件的命名。 我记得几个实例,由此我的SSIS解决方案处理由传统的第三方程序制备和转储到一个FTP位置的文件。 在这种情况下,通常会授予您对FTP位置的读取权限,从而阻止您编辑文件。

  2. Processing Text Files using Foreach ADO Enumerator

    使用Foreach ADO枚举器处理文本文件

    The recommended approach in terms of processing multiple time-stamped text files is using Foreach ADO Enumerator type instead of ForEach File Enumerator. The switch to Foreach ADO Enumerator type requires several changes to your SSIS package as shown in Figure 10. Again, the first step involves using the Execute SQL Task to clear staging tables.

    就处理多个带时间戳的文本文件而言,推荐的方法是使用Foreach ADO Enumerator类型而不是ForEach File Enumerator类型。 切换到Foreach ADO Enumerator类型需要对SSIS包进行几处更改, 如图10所示。 同样,第一步涉及使用Execute SQL Task清除登台表。

    Figure 10: SSIS Package using ADO Enumerator 图10:使用ADO枚举器的SSIS包

    I then use a Script Task (ST – Populate ListOfFiles) that uses methods from LINQ to sort text files by creation time and insert the output into a staging table. The main code of the Script Task is shown in Script 1.

    然后,我使用脚本任务ST – Populate ListOfFiles ),该任务使用LINQ中的方法按创建时间对文本文件进行排序,并将输出插入到临时表中。 脚本任务的主要代码显示在脚本1中

    
    public void Main()
    {SqlConnection cnn = new SqlConnection("Data Source=localhost;Initial
    Catalog=SQLShack;Integrated Security=SSPI;");cnn.Open();string query = "INSERT INTO dbo.ListOfFiles (FileName) VALUES (@FileName)";          var sorted = Directory.GetFiles(@"C:\temp", "Marit*").OrderBy(f => new
    FileInfo(f).CreationTime);foreach (string file in sorted){var getFileName = Path.GetFileName(file);SqlCommand myCommand = new SqlCommand(query, cnn);myCommand.Parameters.AddWithValue("@FileName", getFileName);                  myCommand.ExecuteNonQuery();}               cnn.Close();Dts.TaskResult = (int)ScriptResults.Success;
    }

    Script 1 脚本1

    The Execute SQL Task (ESTPopulate Object Variable) retrieves a list that was built by the Script Task and stores this list into a local package object variable type. As shown in Figure 11, the ForEach loop container is then configured to use Foreach ADO Enumerator type and sources its data from local object variable – varObj.

    执行SQL任务EST- 填充对象变量 )检索由脚本任务构建的列表,并将该列表存储到本地包对象变量类型中。 如图11所示,然后将ForEach循环容器配置为使用Foreach ADO枚举器类型,并从本地对象变量varObj中获取其数据。

    Figure 11: ForEach ADO Enumerator type 图11:ForEach ADO枚举器类型

    The rest of the settings inside the ForEach loop container are similar to the package using the Foreach File Enumerator. Following the package execution, the data stored in the staging table will be similar to what is shown in Figure 9.

    ForEach循环容器内的其余设置与使用Foreach File Enumerator的程序包相似。 在执行包之后,存储在登台表中的数据将类似于图9所示。

结论 (Conclusion)

If your SSIS solution does not process multiple text files using the Foreach File Enumerator type, then you are probably not affected by the issue that has been discussed. However, for those dealing with multiple text files, consider switching over to the Foreach ADO Enumerator type.

如果您的SSIS解决方案未使用Foreach文件枚举器类型处理多个文本文件,则您可能不受所讨论问题的影响。 但是,对于处理多个文本文件的用户,请考虑切换到Foreach ADO Enumerator类型。

资料下载 (Downloads)

  • SQLShackETL SQLShackETL
  • MaritalStatus_FIFABallonDOr_HourClock MaritalStatus_FIFABallonDOr_HourClock
  • MaritalStatus_FIFABallonDOr_MilitaryTime MaritalStatus_FIFABallonDOr_MilitaryTime

参考资料 (References)

  • Foreach Loop Container Foreach循环容器
  • LINQ – Overview LINQ –概述
  • Execute SQL Task 执行SQL任务

翻译自: https://www.sqlshack.com/using-ssis-foreach-loop-containers-process-files-date-order/

ssis for循环容器

ssis for循环容器_使用SSIS ForEach Loop容器以日期顺序处理文件相关推荐

  1. sql docker容器_了解SQL Server Docker容器中的备份和还原操作

    sql docker容器 In this 17th article of the series (see the full article index at bottom), we will disc ...

  2. ssis导出数据性能_使用SSIS Hadoop组件导入和导出数据

    ssis导出数据性能 In the previously published article, we talked briefly about Hadoop, and we gave an overv ...

  3. java gui容器_[Java教程]GUI Panel 容器以及布局管理器

    [Java教程]GUI Panel 容器以及布局管理器 0 2016-11-09 07:04:32 一.Panel是AWT中的另一个典型的容器,它代表不能独立存在.必须放在其他容器中使用. 1.可作为 ...

  4. 什么java web容器_什么是java web容器,_Java_ 少侠科技

    详细内容 我们讲到servlet可以理解服务器端处理数据的java小程序,那么谁来负责管理servlet呢?这时候我们就要用到web容器.它帮助我们管理着servlet等,使我们只需要将重心专注于业务 ...

  5. ssis包部署到数据库_使用SSIS包将行标题和数据添加到平面文件中

    ssis包部署到数据库 In this article, we will configure an SSIS package to generate a composite output in the ...

  6. ssis for循环容器_SSIS Foreach循环与For循环容器

    ssis for循环容器 In this article, first, we will briefly describe foreach loops and for loops. Then, we ...

  7. ssis for循环容器_SSIS包中的序列容器

    ssis for循环容器 This article explores the Sequence container in SSIS package with examples. 本文通过示例探索了SS ...

  8. SSIS中循环遍历组件[Foreach Loop Container]

    背景 每月给业务部门提取数据,每个分公司都要提取一般,先跑SQL,再粘贴到Excel中,然后发邮件给相关的人员.费时费力,还容易粘贴错位.因此,需要通过一个程序完成这些步骤.我首先想到的是通过SSIS ...

  9. ssis 循环导入数据_使用集成服务(SSIS)包从Amazon S3 SSIS存储桶导入数据

    ssis 循环导入数据 This article explores data import in SQL Server from a CSV file stored in the Amazon S3 ...

最新文章

  1. CCNA实验解析——VLAN间的路由的配置
  2. python拼音怎么写-python: 拼音处理模块
  3. 华中科技大学 教学大纲 计算机,教学大纲-华中科技大学计算机学院
  4. 修改MySQL字段为首字母大写
  5. usb转并口支持linux,使用PCI转并口实现SJF刷写嵌入式开发板
  6. C语言里面双分号是啥意思,问什么C程序里总是提示缺少分号;,而明明有分号?...
  7. java.sql.SQLSyntaxErrorException: Unknown column ‘###‘ in ‘field list‘
  8. hadoop 完全分布式模式的安装和配置
  9. python和java的区别-python 和 java 的区别
  10. Running SharePoint on Windows 7(转)
  11. 1 使用WPE工具分析游戏网络封包
  12. python细胞自动机及微分计算
  13. FasterRCNN详解
  14. 如何在Cell里画出虚线?
  15. 基于HTML的旋转立方体的实现
  16. 微信企业号(公众号)开发流程汇总
  17. SAP MM 采购申请后台配置
  18. 高性能RabbitMQ消息队列介绍 及 SpringBoot整合
  19. 我的时间管理——计划与总结的重要性
  20. Redis server went away

热门文章

  1. linux工具-journalctl查询日志
  2. php中如何使用html代码
  3. Oracle创建视图的一个问题
  4. 当网站不允许上传ASP,CGI,CER等脚本文件时
  5. 3d数学基础:图形和游戏开发(第2版)_游戏引擎编程需要哪些基本数学知识?
  6. 电脑自动关机设置方法
  7. redis-shake简介
  8. 理财产品利息可每天提取吗?
  9. 为什么显卡更新换代极快,每年都会有更强的新系列,而声卡却永远停留在了“兼容DX9的集成声卡“?
  10. 进大学时高考成绩是班里第一,同样也是努力学习,为什么大学时做不到第一了呢?