Using the Destination Component
After Establishing a Connection to the data source, add the CData Azure Cosmos DB destination component to the workflow to load data into Azure Cosmos DB.
Writing to Azure Cosmos DB in a Data Flow
Follow the steps below to connect to Azure Cosmos DB and update data.
- In the SSIS Toolbox, drag the CData Azure Cosmos DB destination component into the Data Flow Task.
- Connect the output of a source component to the CData Azure Cosmos DB destination component.
- Double-click the CData Azure Cosmos DB destination component. The CData Azure Cosmos DB Destination Editor dialog is displayed.
- In the Connection Managers menu, select an available CData Azure Cosmos DB connection manager, or create a new instance if one is not already available.
- In the "Use a Table" option, select the table to update.
- Select the data manipulation action. See below for more information on each action.
- On the Mappings tab, configure the mappings from source to destination. By default, outputs from the source component are automatically mapped with the same name as the columns in the table you selected. You can further update these selections.
NOTE: Read-only columns are not visible among the destination columns since they cannot be written to.
Command Execution
When you execute the data flow, the component executes one of the following operations to update the destination table.
Insert
The component takes the mapped values and attempt to insert the data as new rows into the table. By setting the OutputKey property to True in the destination component's properties, you can retrieve the results of the insertion in the error output of the component with the 'Redirect row' error behavior.
Update
The component attempts to update an existing row based on the primary key provided. The primary key column must be mapped, and it must not be null. By setting the OutputKey property to True in the destination component's properties, you can retrieve the results of the update in the error output of the component with the 'Redirect row' error behavior.
Upsert
The component uses the primary key to decide if a row is to be inserted or updated. If the primary key column is mapped and it is not null, the component attempts to update an existing row based on the primary key provided. If the primary key is not mapped or if it is null, the CData Azure Cosmos DB Destination Component attempts to insert the data as a new row. By setting the OutputKey property to True in the destination component's properties, you can retrieve the results of the upsert in the error output of the component with the 'Redirect row' error behavior.
Delete
The component attempts to delete an existing row based on the primary key provided. The primary key column must be mapped, and it must not be null.
Bulk Operations
The destination component by default uses bulk operations to update the data source. This behavior is controlled by the BatchMode and BatchSize properties of the component. The BatchSize controls the maximum size of the batches to submit to the component at once. Depending on the volume of data being submitted, increasing the BatchSize can improve throughput but requires a larger memory footprint.
In addition to BatchMode and BatchSize, there are separate properties that define a global maximum size, DefaultBufferMaxRows and DefaultBufferSize. When performing very large write operations, you should increase the values for these as well for best performance, since the default values are very low.
SSIS limits its own buffer size to 10MB and row buffer size to 10k rows by default. Because of the limit on the number of bytes, SSIS may decide to send an odd number of rows to the Destination, so you may not end up with the exact numbers you expected based on the row count.
See Improve Data Flow Performance with SSIS AutoAdjustBufferSize for more information about adjusting buffer properties.
DefaultBufferMaxRows should match or exceed the value you use for the batch size in the destination component, but you may need to test and iterate on DefaultBufferSize to arrive at an appropriate value. For reference, the value of 100000000 for DefaultBufferSize corresponds to ~100MB, which you can use as a starting point.
As a side note, there is a package property called AutoAdjustBufferSize in SSIS 2016+ that automatically determines an appropriate size, but this takes several iterations and generally does not perform as well as manual iteration does, so we recommend the latter.