Using the Destination Component
After Establishing a Connection to the data source, add the CData MongoDB destination component to the workflow to load data into MongoDB.
Writing to MongoDB in a Data Flow
Follow the steps below to connect to MongoDB and update data.
- In the SSIS Toolbox, drag the CData MongoDB destination component into the Data Flow Task.
- Connect the output of a source component to the CData MongoDB destination component.
- Double-click the CData MongoDB destination component. The CData MongoDB Destination Editor dialog is displayed.
- In the Connection Managers menu, select an available CData MongoDB connection manager, or create a new one if one is not already available.
- Set Use a Table to the table you want to update.
- Set Action to a data manipulation action. See below for more information on each action.
- Select the Columns tab to discover the available columns for the table you identified in Step 5 above.
- On the Mappings tab, configure the mappings from source to destination. See below for more information.
QueryPassthrough Support
NOTE: For performance reasons, MongoDB does not support QueryPassthrough when using the destination component. Make sure you use a connection with QueryPassthrough set to False to use this component.Column Configuration
You can edit the name, data types, length, precision, and scale of your destination columns in the Destination component's Columns tab.
Select a property in the list to edit it. You can also add, reorder, and remove columns using the buttons near the bottom of the interface.
If you want to revert any changes you have made to the columns to their defaults, click Refresh. Note that this will also delete any new columns you have added.
Column Mapping
In the Destination component's Mappings tab, you can map columns from the output of the inbound source component to columns in the table specified in the destination component.
Note: Opening the Columns tab retrieves metadata for the selected destination table. You must select this tab before columns will appear in the Mappings tab.
The Mappings tab is broken up into two tabs: TableView and DiagramView.
TableView
The TableView tab represents column mappings as a table with the following columns. It displays extra information not visible on the DiagramView tab.
- Available Input Columns: Select a column from the input columns to map to a destination column.
- If you opened the Columns tab before opening the Mappings tab for the first time, the input columns are autopopulated.
- Available Destination Columns: Displays the column from the destination columns that the input column maps to.
- Data Type: Displays the data type of the destination column.
- Column Size: Displays the column size of the destination column.
- Mapped: Toggles whether the current mapping is active.
You can also filter the displayed columns using the Filter box and the Read-only columns, Mapped columns, and Unmapped columns checkboxes at the top of the tab.
DiagramView
The DiagramView tab provides a visual representation of the column mappings.
Drag an input column's name from the Available Input Columns box to a column in the Available Destination Columns box to create a mapping. Each active mapping is represented by a line between the input column name and destination column name.
Command Execution
When you execute the data flow, the component executes one of the following operations to update the destination table.
Insert
The component takes the mapped values and attempts to insert the data as new rows into the table. By setting the OutputKey property to True in the destination component's properties, you can retrieve the results of the insertion in the error output of the component with the Redirect row error behavior.
Update
The component attempts to update an existing row based on the primary key provided. The primary key column must be mapped, and it must not be null. By setting the OutputKey property to True in the destination component's properties, you can retrieve the results of the update in the error output of the component with the Redirect row error behavior.
Upsert
The component uses the primary key to decide if a row is to be inserted or updated. If the primary key column is mapped and it is not null, the component attempts to update an existing row based on the primary key provided. If the primary key is not mapped or if it is null, the CData MongoDB Destination Component attempts to insert the data as a new row. By setting the OutputKey property to True in the destination component's properties, you can retrieve the results of the upsert in the error output of the component with the Redirect row error behavior.
Delete
The component attempts to delete an existing row based on the primary key provided. The primary key column must be mapped, and it must not be null.
Bulk Operations
By default, the destination component uses bulk operations to update the data source. This behavior is controlled by the BatchMode and BatchSize properties in the Properties pane of the destination component. The BatchSize controls the maximum size of the batches to submit to the component at once. Depending on the volume of data being submitted, increasing the BatchSize can improve throughput but uses more memory.
In addition to BatchMode and BatchSize, there are separate properties, in the Properties pane of the Data Flow Task, that define a global maximum size, DefaultBufferMaxRows and DefaultBufferSize. When performing very large write operations, you should increase the values for these as well for best performance, since the default values are very low.
SSIS limits its own buffer size to 10MB and row buffer size to 10k rows by default. Because of the limit on the number of bytes, SSIS may decide to send an odd number of rows to the Destination, so you may not end up with the exact numbers you expected based on the row count.
See Improve Data Flow Performance with SSIS AutoAdjustBufferSize for more information about adjusting buffer properties.
DefaultBufferMaxRows should match or exceed the value you use for the batch size in the destination component, but you may need to test and iterate on DefaultBufferSize to arrive at an appropriate value. For reference, the value of 100000000 for DefaultBufferSize corresponds to ~100MB, which you can use as a starting point.
As a side note, there is another property in the Data Flow Task's properties list called AutoAdjustBufferSize in SSIS 2016+ that automatically determines an appropriate size, but this takes several iterations and generally does not perform as well as manual iteration, so you should typically use the latter.
Create New Table
A new table can be generated in the destination database by clicking the New... button next to Use a table. This opens a dialog with a generated CREATE TABLE query. The default table name is "CData Destination" and the column definitions are taken from the input to the destination component. This query can be modified as desired, or you can supply your own custom query here. After you click OK, the query is executed on the destination database.NOTE: The generated SQL query is in the SQL Server dialect. It is recommended to have QueryPassthrough set to False in the connection manager when creating tables through this method.