SSIS Components for Spark SQL

Build 24.0.9060

Using the Source Component

After Establishing a Connection to the data source, you can use the CData Spark SQL source component to pull data into your Data Flow task.

Querying Spark SQL Data with the Source Component

Follow the procedure below to connect to Spark SQL, retrieve data, and provide data to other components in the workflow.
  1. In the SSIS Toolbox, drag the CData Spark SQL source component into the Data Flow task.
  2. Double-click the CData Spark SQL source component. The CData Spark SQL Source Editor will display.
  3. In the Connection Managers menu, select an available CData Spark SQL connection manager, or create a new connection manager with the New... button if one is not already available.
  4. Choose your access mode in the Data access mode dropdown menu. Select Table or View to use the GUI to select a table or view. Select SQL command to author a custom SQL statement in the SQL command text box.
  5. Select the Columns tab and rename any output columns as desired.
When you execute the data flow, rows from your selected table or statement will be made available to the components in the data flow.

See SELECT Statements for the supported SQL syntax and examples. The process for calling stored procedures is detailed in Calling Stored Procedures.

Using Parameterized Queries

Parameterized statements provide an efficient way to execute queries dynamically and mitigate SQL injection attacks. When using a custom SQL Command, the Source Component provides a Parameters button that launches the Set Query Parameters dialog. This interface contains the following columns:
  • Parameters: An alias for an in-query parameter in the custom SQL command box. An entry is generated for each positional parameter (represented by a ?) and named parameter (in the format @ParameterName) present in the query.
  • ColumnGlobalVariables: The system-level or package-level parameter you would like to map the in-query parameter to. The in-query parameter will resolve to the value of the mapped parameter.

In order to use the Parameters option, your query must contain parameters, which can be either defined as a positional parameter or as a named parameter:

  • Positional parameters: When setting up the parameter mapping in the Set Query Parameters dialog, the names in the Parameters column must be the index (starting from 1) of that parameter in the query. Consider this example query:
    SELECT * FROM Table WHERE FirstName = ? AND Date > ?

    Here, the names in the Parameters column must be set to '1' and '2' for 'FirstName' and 'Date' respectively.

  • Named parameters: When setting up the parameter mapping in the Set Query Parameters dialog, the names in the Parameters column must exactly match the names of the parameters in the query, without the preceding '@' symbol. Consider this example query:
    SELECT * FROM Table WHERE FirstName = @FirstName AND Date > @Date

    Here, the names in the Parameters column must be set to 'FirstName' and 'Date'.

Building Dynamic Queries in the Expression Builder

After configuring a source component, you can then use the SSIS Expression Builder to access the SQL statement that the source component executes at run time.

The component builds and executes these dynamically defined queries at runtime.

  1. In SSIS Designer, click the Control Flow tab.
  2. In the Properties pane, click in the value box for the Expressions property and click the '...' button that appears in the box. The Property Expressions Editor window appears.
  3. In the resulting Property Expressions Editor window, click an empty row in the Property box and select the SQLStatement property of the CData Spark SQL Source component in the drop-down menu.
  4. Click the "..." button at the end of the row where you just selected the SQLStatement property. The Expression Builder window appears.
  5. In the Expression box, you can create new SQL commands that use the variables available at run time as input parameters. Ensure that you enclose the expression in quotes. For example:
    "SELECT * FROM Table WHERE FirstName = '" + @[User::Name] + "' AND Date > '" + (DT_WSTR, 50) DATEADD("day", -30, GETDATE()) + "'"

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 24.0.9060