SSIS Tuning: DefaultBufferSize
Хороший и перманентно актуальный ликбез по default buffer size
1. https://www.desertislesql.com/wordpress1/?p=318
в каментах update:
увеличили буфер до 2 ГБ-1байт в 2016м скуле
Wanted to note the increase in maxaximum buffer size, from 100mb through SQL Server 2014 to 2^31-1 bytes (2GB – 1 byte) in SQL Server 2016.
https://docs.microsoft.com/en-us/sql/integration-services/data-flow/data-flow-performance-features
Denzil Ribeiro discusses the columnstore motivation for the SSIS 2016 max buffer size increase here.
https://blogs.msdn.microsoft.com/sqlcat/2016/02/29/sql-server-2016-ssis-data-flow-buffer-auto-sizing-capability-benefits-data-loading-on-clustered-columnstore-tables/
2. Более подробно про DefaultBufferSize tuning : http://www.sqlis.com/post/Log-Events-and-Pipeline-Events.aspx
3. Еще более подробно, про AutoAdjustBufferSize : https://www.mssqltips.com/sqlservertip/4221/improving-data-flow-performance-with-ssis-autoadjustbuffersize-property/
Статья 1:
SSIS Tuning – What Size is my row, your DIY Buffer Sizing Guide
"When looking to improve the performance of an SSIS package, one of the common recommendations listed in many places is to modify your buffers so that you can load as much data into a single buffer size as possible. According to Microsoft, for optimal performance, SSIS should be configured for maximum memory utilization by having buffers with as many rows as possible without exceeding the internal 100 MB limit. Ok now that you know what Microsoft says about the topic, how does one put it in practice? I thought it might be nice to post the How-To as it is a little complicated.
Buffer Tuning Parameters
There are a couple of default settings that are used for loading data into memory and they all need to line up correctly to keep data from being sent from memory to disk. In order to set the size appropriately, it is best to apply the Price Is Right Axiom.
DefaultMaxBufferSize – the default is 10 MB. The maximum size is 100MB, which SSIS stores as MaxBufferSize. This value can never be any bigger than 100MB, so size it appropriately for it’s environment.
DefaultMaxBufferRows – The default is 10,000 rows. This setting needs to be set the same way that you win with the Price is Right . To win, the default max buffer rows * (times) the row size needs to be as close to DefaultMaxBufferSize as possible without going over.
Size Matters
To figure out what to put in the DefaultMaxBufferRows, one needs to know how big the rows are. Remember to win the value must be as close to the size of DefaultMaxBufferSize without going over. If the buffer is sized too large, the package will be slower as rows will be cached to disk, so make sure you don’t miss by even a little. How big is the row size? There are a number of ways of figuring this out. You can look at each field in the query SSIS is loading, and based on the data type, add up the value of all the fields. Fortunately that isn’t the only way to figure out the row size. If your data source is SQL Server, you are in luck, as the system tables can help to determine what the size is. Here is a sample the query, assuming your table name is Address, which you can run on the AdventureWorks database.
Exec sp_SpaceUsed 'Person.Address'
The Results are
name rows reserved data index_size unused
Address 19614 5960 KB 2784 KB 2688 KB 488 KB
To figure out what the size of your buffer should be for this entire table is to take the number of (data *1024)/ Rows as 100MB is the max size you can set. To calculate the row size, use the formula values 2784 / 19614 * 1024 = 145.346, or 146 bytes per row. If you set DefaultMaxBufferRows to 100MB, which is the maximum and what I recommend in most cases, it is 104857600 bytes is the Buffer Size. Buffer Size/ Row size = DefaultMaxBufferRows. 104857600 / 146 = 718202.73 so set the DefaultMaxBufferRows to 728203 If you are using the columns, you can get the same information by looking at the syscolumns. By using the column length, it is relatively easy to figure out what the appropriate size of your buffer should be, by adding up the column lengths. One word of caution. I do not wish to imply that because the information is available on a per table basis one should pick Table or View in the SSIS source. Au contraire. Always access the data by using a Select statement as it performs better.
Validate
To ensure that you have improved the buffer size performance, check it. After you are done with the settings, Enable logging on the data flow task, and select the BufferSizeTuning event to see how many rows are contained in each buffer.
Please feel free to drop me a line if you find this helpful.
Yours Always
Ginger Grant"
Статья 2:
Log Events and Pipeline Events
I was about to write about data flow buffer size configuration properties but need to get this one out first so that we can refer to it.
The Data Flow task (internally and in the object model also called the 'Pipeline') logs some pretty interesting information that describe the internal scheduler.
To see these messages, one way is:
1. In a package you want to get detailed pipeline logs on, select the 'Logging...' menu option on the control flow.
2. Check the Data Flow task of interest on the tree on the left. Switch over to Details and select the following events:
BufferSizeTuning; PipelineExecutionTrees; PipelineExecutionPlan; PipelineInitialization
3. Run the package and launch the 'Log Events' window from the Control Flow context menu again.
You'll see a new tool window that'll pop up and it will get filled with some events.
BufferSizeTuning
Example:
Rows in buffer type 3 would cause a buffer size greater than the configured maximum. There will be only 2440 rows in buffers of this type. Next post we will talk about what this means.
PipelineExecutionTrees
Example:
begin execution tree 0 output "Aggregate Output 1" (779) input "Data Conversion Input" (793) output "Data Conversion Output" (795) input "OLE DB Destination Input" (827) end execution tree 0 begin execution tree 1 output "Flat File Source Output" (1087) input "Data Conversion Input" (958) output "Data Conversion Output" (959) input "Lookup Input" (117) output "Lookup Output" (118) input "Sort Input" (470) end execution tree 1 begin execution tree 2 output "Sort Output" (471) input "Multicast Input 1" (668) output "Multicast Output 2" (672) input "OLE DB Destination Input" (643) output "Multicast Output 3" (674) input "Aggregate Input 1" (776) end execution tree 2
We've discussed execution trees in the past... they start from a source or a component with asynchronous output and go on until a destination or another asynchronous output. Looks like I have 3 execution trees in my package.
PipelineExecutionPlan
The plan for my package is too large to paste here. This was hinted at in an earlier post when describing EngineThreads. Soon...
PipelineInitialization
Example:
No temporary BLOB data storage locations were provided. The buffer manager will consider the directories in the TEMP and TMP environment variables. These are typically information messages about the directories and other defaults the pipeline will use during its execution.
Reproduced by kind permission of Ashvini Sharma (Microsoft).