SSIS Components for Apache Hive

Build 21.0.7930

Using the Destination Component

After Establishing a Connection to the data source, add the CData Apache Hive destination component to the workflow to load data into Apache Hive.

Writing to Apache Hive in a Data Flow

Follow the steps below to connect to Apache Hive and update data.

  1. In the SSIS Toolbox, drag the CData Apache Hive destination component into the Data Flow Task.
  2. Connect the output of a source component to the CData Apache Hive destination component.
  3. Double-click the CData Apache Hive destination component. The CData Apache Hive Destination Editor dialog will display.
  4. In the Connection Managers menu, select an available CData Apache Hive connection manager, or create a new instance if one is not already available.
  5. In the "Use a Table" option, select the table to update.
  6. Select the data manipulation action. See below for more information on each action.
  7. On the Mappings tab, configure the mappings from source to destination. By default, outputs from the source component will automatically be mapped with the same name as the columns in the table you selected. You can further update these selections.

Note: Read-only columns will not be visible among the destination columns since they cannot be written to.

Command Execution

When you execute the data flow, the component will execute one of the following operations to update the destination table.

Insert

The component will take the mapped values and attempt to insert the data as new rows into the table. By setting the OutputKey property to True in the destination component's properties, you can retrieve the results of the insert in the error output of the component with the 'Redirect row' error behavior.

Update

The component will attempt to update an existing row based on the primary key provided. The primary key column must be mapped, and it must not be null. By setting the OutputKey property to True in the destination component's properties, you can retrieve the results of the update in the error output of the component with the 'Redirect row' error behavior.

Upsert

The component uses the primary key to decide if a row is to be inserted or updated. If the primary key column is mapped and it is not null, the component will attempt to update an existing row based on the primary key provided. If the primary key is not mapped or if it is null, the CData Apache Hive Destination Component will attempt to insert the data as a new row. By setting the OutputKey property to True in the destination component's properties, you can retrieve the results of the upsert in the error output of the component with the 'Redirect row' error behavior.

Delete

The component will attempt to delete an existing row based on the primary key provided. The primary key column must be mapped, and it must not be null.

Bulk Operations

The destination component will by default use bulk operations to update the data source. This behavior is controlled by the BatchMode and BatchSize properties of the component. The BatchSize controls the maximum size of the batches to submit to the component at once. Depending on the volume of data being submitted, increasing the BatchSize can improve throughput but will require a larger memory footprint.

Copyright (c) 2021 CData Software, Inc. - All rights reserved.
Build 21.0.7930