Using the Execute SQL Task
After Establishing a Connection to the data source, you can use the CData Apache Hive Task run stored procedures and SQL queries at the Control Flow level.
Querying Apache Hive Data with the Apache Hive TaskComplete the following steps to connect to Apache Hive and execute custom SQL queries at the Control Flow level. A common use case for this would be to truncate table data or executing stored procedures before executing your workflow.
- In the SSIS Toolbox, drag the CData Apache Hive Task into the Control Flow.
- Double-click the CData Apache Hive Task. The CData Apache Hive Task opens.
- In the Connection drop-down list, select an available CData Apache Hive connection manager, or create a new instance if one is not already available.
- Configure the query you want to execute. There are three properties that control this:
- SQLSourceType: Where to source the query from, either a direct input in the CData Apache Hive Task UI or from a variable in the package.
- CommandType: The type of command to run. The options are Table, Stored Procedure, or Command Text. For the first two, the Task expects only the name of the object to execute. The Command Text option allows you to execute any SQL command you'd like.
- The last option is context-specific based on the values you set for the above two settings.
- Optionally, you may assign parameters and/or a result set, as described in the following section.
See SELECT Statements for the supported SQL syntax and examples.
Defining the Parameter MappingParameters can be used with the Stored Procedure and Command Text CommandTypes. On the Parameter Mapping tab, add as many Parameters as you need. When using the Stored Procedure option, the Parameter Name must exactly match the name of the parameter defined in the stored procedure. For the Command Text option, the name must match the name used in the SQL command. You can assign a variable to either read from or write to depending on the Direction chosen. The available Direction options are as follows:
- Input Indicates the value is read from an SSIS variable when executing the call.
- Output: Indicates the value is returned from the command call and stored in a local variable for use later in the package.
- InOut: A combination of the above. The value is read when the query is executed and the returned value overwrites the existing value after execution is complete.
Configuring a Result SetA Result Set can be configured if the query results are needed later in the package. There are two options: Single Row and Full ResultSet. Each one affects how the values are output by the Task and how you need to define the output on the Result Set tab in the UI. Note that when using a Stored Procedure, defining OUT or IN/OUT parameters can be used instead of configuring a Result Set.
- Single Row: When a single row is returned, you need to set the ResultSet Name to the index (i.e. 0, 1, 2, etc.) of the column you are querying. The Variable Name must be mapped to a variable in SSIS that is the correct data type as the output column.
- Full ResultSet: With the Full ResultSet option, only a single value is returned which is a DataTable containing the full data returned from the query. The ResultSet Name should be set to 0. The SSIS Variable Name must be set to a variable of type Object. Note that if multiple ResultSets are returned from the query, the Task currently only supports outputting the first.