原文链接:

http://www.informit.com/articles/article.aspx?p=1554201&seqNum=2

Recommended Practices with Partitions and Aggregations

The following sections offer recommendations for working with partitions and aggregations.

#1: Abide by Prescribed Limits for Partition Sizes

Microsoft recommends limiting partition sizes so they contain up to 20 million rows or have a file size up to 1GB. Some applications attempt to use much larger partitions, perhaps because data is partitioned only by month or by day. Realize that during processing, MSAS has to read an entire partition’s data on a single thread. It is often more efficient to process five partitions of 20 million rows, in parallel, as opposed to processing a single partition with 100 million rows. Furthermore, if you partition your data according to the typical query patterns, you will see a far superior query performance than if your measure group had a single large partition.

For example, let’s suppose your measure group is partitioned by year and by product category. Suppose we have data for five years and for three categories (bikes, accessories and clothing). If we store all data in a single 15GB partition, every query will have to examine this 15GB data file (presuming data is not found in an MSAS storage engine cache and no useful aggregations exist for resolving the query). Now let’s split the data into 15 partitions of 1GB each—one for a combination of each year and category. A query examining bike sales for 2009 will only have to read a single 1GB file. Scanning 15GB of data will invariably be slower than scanning a 1GB file. Many people feel that they will end up with too many partitions if they partition data on any dimension other than time or date dimension. This simply is not true. Theoretically, there is a limit to the number of partitions per cube—2 billion. Most cubes will have far fewer partitions, however. So go ahead and partition by multiple hierarchies when possible to match the pattern of data retrieval.

You have a couple of options for populating measure groups partitioned by multiple hierarchies. You could define a separate view in the relational data source for each partition, each view retrieving only portion of fact table’s data. Alternatively, you could also bind each partition’s definition to a different query. Personally, I prefer thesecond option, particularly for environments where I don’t have direct access to make schema changes in the relational source.

#2: Define the Slice Property for Every Partition

Much like dimensions, each partition also has several properties that should be carefully examined and configured appropriately. Although some literature advises that setting a partition slice property is unnecessary for MOLAP partitions, do yourself a favor and set this property for every partition. At query time, MSAS checks partition XML files (these files are called info.version_number.xml) for internal data ids identifying data ranges for each dimension attribute. At times, if partition slice isn’t defined, you will notice that MSAS reads more partitions than necessary to resolve a query. For example, instead of only reading the bike_sales_2004 partition, Analysis Services may also read the clothing_sales_2005 partition if the slice property isn’t set, even if the query only requested data for bike sales in 2004. Reading a single partition will be faster than reading multiple partitions.

#3: Use the Aggregation Manager Sample Tool for Designing Custom Aggregations

Microsoft re-engineered the Usage-Based Optimization (UBO) Wizard with Analysis Services 2008 because with version 2005, it wasn’t always effective; at times, the wizard would not create useful aggregations even if you chose a 100 percent performance improvement goal. Business Intelligence Development Studio (BIDS) 2008 also offers the ability to pick and choose which attributes should be included in a specific aggregation through the advanced view within the Aggregations tab. SQL Server Management Studio (SSMS) allows scripting aggregation designs. However, neither BIDS nor SSMS wizards allow crafting aggregations for specific queries.

If you find that UBO does not meet your needs, then download the Aggregation Manger sample tool. The tool is easy to use and works with both 2005 and 2008 versions. First, clear the query log, next execute the query workload for which you would like to tune performance, and then build aggregations based on the query log. You may want to use eliminate redundancy and remove duplicates options within Aggregation Manager so that you don’t have too many aggregations.

NOTE

Note that Aggregation Manager simply creates aggregation design; you will need to process partitions using the ProcessIndexes option to actually create aggregation files. After you create aggregations, be sure to re-run your queries and monitor progress report events within SQL Profiler to see if any of the newly added aggregations aren’t being used.

You can have multiple aggregation designs per measure group. For example, you could have one aggregation design with many aggregations for frequently accessed partitions; another aggregation design containing only a few aggregations could be applied to rarely queried historical partitions.

#4: Use Separate Measure Groups for Distinct Count Measures

This recommendation is well documented but not always followed. BIDS automatically assigns a measure with distinct count aggregation function to its own separate measure group. However, if you create a measure with sum, count or another aggregation function and later change its aggregation function to distinct count,then BIDS will allow you to shoot yourself in the foot. Fortunately, BIDS 2008 does warn developers of such mistakes.

#5: Specify the Maximum Degree of Parallelism for Processing Objects on Multi-Processor Servers

By default, MSAS decides the appropriate degree of parallelism for processing operations. However, on multi-processor hosts you may find that software sometimes attempts to do more than it can handle. Processing too many partitions in parallel may also exact unbearable load on the relational database server. Fortunately, you can override the default option and specify the degree of parallelism for processing operations through XMLA commands.

转载于:https://www.cnblogs.com/keepmove/p/4922663.html

SSAS分区数据量过大处理参考 转- Recommended Practices with Partitions and Aggregations相关推荐

  1. rdd数据存内存 数据量_大数据开发-Spark调优常用手段

    Spark调优 spark调优常见手段,在生产中常常会遇到各种各样的问题,有事前原因,有事中原因,也有不规范原因,spark调优总结下来可以从下面几个点来调优. 1. 分配更多的资源 分配更多的资源: ...

  2. 海量数据(数据量比较大时)的处理分析

    转载http://www.blogjava.net/lcs/archive/2008/02/18/180396.html 海量数据处理问题是一项艰巨而复杂的任务.原因有以下几个方面: 一.数据量过大, ...

  3. MyBatis中使用流式查询避免数据量过大导致OOM

    欢迎关注方志朋的博客,回复"666"获面试宝典 今天mybatis查询数据库中大量的数据,程序抛出: java.lang.OutOfMemoryError: Java heap s ...

  4. for循环数据量太大_中文文本分类roberta大力出奇迹之数据量大的问题

    问题描述: 笔者在文本分类场景中使用了roberta+pool+dense的三分类模型.采用预训练模型做项目的时候经常苦于数据太少,模型泛化性差,因此收集了1300W数据.在我尝试暴力出奇迹的时候,遇 ...

  5. PHP导出Excel时数据量过大的问题

    1.设置脚本运行时间 set_time_limit(0) 2.运行内存设置 当数据量比较大时就需要设置memory_limit,来防止内存报错,但是这终究不是解决办法,因为系统的内存是有限的,比如你设 ...

  6. 模型训练遇到数据量太大而导致内存不够问题?今天教你一招

    在比赛和工作中,我们经常会遇到数据量太大而导致内存不够的问题.这里可以细分为两种情况: 情况1:数据太大,无法加载到内存: 情况2:加载数据但训练时内存不够: 针对情况1可以考虑使用Spark或者Da ...

  7. R语言使用hexbin包的hexbin函数可视化散点图、应对数据量太大、且有数据重叠的情况、普通散点图可视化效果变差的情况、提供了对六边形单元格的二元绑定、通过图例颜色标定每一个区域数据点的数量

    R语言使用hexbin包的hexbin函数可视化散点图.应对数据量太大.且有数据重叠的情况.普通散点图可视化效果变差的情况.hexbin函数提供了对六边形单元格的二元绑定.通过图例颜色标定每一个区域数 ...

  8. ES 在数据量很大的情况下(数十亿级别)如何提高查询效率?

    点击上方蓝色"程序猿DD",选择"设为星标" 回复"资源"获取独家整理的学习资料! 作者 | advanced-java 来源 | http ...

  9. 第9条:用生成器表达式来改写数据量较大的列表推导式

    核心知识点: (1)当输入的数据量较大时,列表推导可能会因为占用太多内存而出问题. (2)由生成器表达式所返回的迭代器,可以逐次产生输出值,从而避免内存用量问题. (3)把某个生成器表达式所返回的迭代 ...

最新文章

  1. 零基础python从入门到精通 pdf-跟老齐学Python从入门到精通 电子版(pdf格式)
  2. pytorch学习笔记 torchnn.ModuleList
  3. CMakeList下打印log
  4. directx最终用户运行时_运维定位服务故障时,前5分钟都在忙啥?
  5. Ubuntu下使用AMD APP编写OpenCL程序
  6. Tomcat学习--配置tomcat
  7. 数据结构之图的存储结构一及其实现
  8. canvas一些属性
  9. 一道不起眼的面试题,但被头条面试官玩出了花,48张图,2个动画,带你还原面试现场
  10. BW文件格式打开工具XnView
  11. excel比较两列数据,相同?包含?
  12. Multisim14仿真基本模拟电路之 10. 3. 2比例放大电路的仿真实验与分析
  13. STM32控制SG90舵机
  14. 计算机的好与坏作文,电脑“坏”了的作文
  15. 【动态更新】解决夫妻两地分居手续
  16. [HDU 5349] MZL's simple problem 神题
  17. ​​​​​​​排列组合基本原理及公式
  18. Android电子书阅读器的设计与实现
  19. Ctrix卸载ReceiverCleanupUtility.exe
  20. 2018最新支付系统/第三方支付系统/第四方支付系统/聚合支

热门文章

  1. 网站建设特定操作流程了解一下不吃亏
  2. 浅析网站备案的三大好处——你的网站备案了吗?
  3. (传送门)android studio 一直卡在Gradle:Build Running的解决办法
  4. CNN中的局部连接(Sparse Connectivity)和权值共享
  5. 格伦布编码——rice编码无非是golomb编码M为2^x的特例
  6. 如何居中一个元素(终结版)
  7. centos6 升级gcc / 无法识别的命令行选项“-std=gnu++1y”的解决办法
  8. 测开之路五:异常处理
  9. unity_简单五子棋的实现(无AI)
  10. redis学习笔记(一): sds