Databricks Select Configuration

Version 24.2.9013


Databricks Select Configuration


The Select action retrieves data from Databricks and brings it into CData Arc.

Tables and Columns

If you choose Select for the Databricks connector, you must select a target table (or tables) from Databricks. Click the Add button above the Tables pane.

A modal appears and lists all available tables. Select the desired table and click Add.

The chosen table appears under the Tables pane, and the columns in the table appear under the Columns pane. The connector automatically detects values that have special relevance (such as the primary key, foreign keys, etc.) and labels them accordingly.

By default, all columns are selected for inclusion in the action for the Databricks connector. Exclude individual columns by unchecking them.

Note: You can select more tables by clicking Add and repeating this process. See Child Tables for more information.

Filters

After you select a table, you can optionally use the Filter pane to filter the data that the connector selects. The connector translates your filter rules into query syntax, and the syntax appears in the Query pane at the bottom of the connector configuration page.

Note: You do not need to apply a filter to your data if it does not suit your use case.

Creating a Filter Rule

To create a filter rule, follow these steps:

  1. Click Add rule.
  2. In the drop-down list for the new rule, choose the column that you want to filter.
  3. Two new fields appear. In the new drop-down list, choose the filtering condition.
    • The available filtering conditions depend on the data type for the column you select. For example, a long data type has the condition Greater Than, whereas a string data type does not.
  4. If your filtering condition requires a value, enter that value in the blank value field. If your filtering condition does not require a value (for example, if you choose Is Null), the value field disappears.
  5. When your filter rule is complete, its background changes from red to white.

The image below shows an example of a filter rule on a column named Author. With this filter rule in place, the connector only selects values in the Author column that contain Smith.

Filter Groups

You can apply multiple filter rules to your connector by clicking Add rule and creating more rules. Filter rules are separated by groups, and all filters belong to the same group by default. You can create multiple groups to separate filter rules by clicking Add group.

Each filter rule in a group interacts with the others based on the options you select at the top of the group:

  • Check the NOT box to invert the filter conditions. For example, in the Author filter shown above, checking the NOT box instructs the connector to select Author values that do not contain Smith.
  • Select either AND or OR to determine the group logic for multiple filters.
    • If you select AND, every filter rule in the group must be true for the filter conditions to be met.
    • If you select OR, at least one filter rule in the group must be true for the filter conditions to be met. Multiple filter rules can be met, and the results are the same as if only one were met.

Filter groups have simple parent/child relationships. Each filter rule and filter group that belongs to a parent group exists as a single entity for the filter logic. For example, the screenshot below shows multiple filter groups.

In this example, the parent group is using OR logic. This means that if any of the following statements is true, the filter condition is met:

  • The Author column contains the value Smith
  • The Id column equals 1 or 2
  • The Title column contains the value Kitchen and the UpdatedAt column is greater than 400

If this filter condition is met, the connector Selects the data that meets the condition and passes it down the flow.

Advanced Settings

In the Advanced section, you can configure settings for using specific columns for processing records and updating columns with specific values when rows are processed successfully. These settings are useful for maintaining and tracking changes in your flow.

Selecting from Multiple Tables

The Databricks connector supports Selecting multiple tables with a single action. You can accomplish this in two ways:

  • Assign extra tables as children of a parent table. See Child Tables for more information.
  • Create multiple Select-configured Databricks connectors, assign each of them a different table, and connect them all to the same connector in the flow.

You can assign separate queries to each child table or individual Databricks connector as needed. See Filters for more information.

Example Arc Flow

After you choose the tables and columns for the Select-configured Databricks connector, place the connector at the beginning of an Arc flow. The screenshot below depicts an example flow with a Databricks Select connector, an XML Map connector, a CSV connector, and a File connector at the end:

Data in this example flow passes through these steps:

  1. The Databricks_Select connector pulls data from a Databricks connection.

  2. This data passes to the XML_Map connector as the Source File, which maps to the Source tree.

  3. Data from the CSV1 connector passes to the XML_Map connector as the Destination File, which maps to the Destination tree.

  4. The XML_Map connector attempts to automatically map Source elements to Destination elements with the same name. You can manually change these and fill in blank mappings by dragging elements from Source to Destination.

  5. The file from CSV1 that has been modified with data from Databricks_Select then passes down the flow from CSV1.

XML Aggregate Columns

Most Databricks database columns function as single elements for mapping. The examples above in Example Arc Flow demonstrate single-element columns.

Some columns appear in the Databricks connector with the XML aggregate icon </> next to their names. In contrast to a standard single-element column, an aggregate column contains a collection of entities. The following screenshot shows an aggregate column in a Select-configured Databricks connector:

In an XML Map connector, the full structure of the aggregate column expands to display all of its contents. The following screenshot shows an aggregate column expanded:

Each element of an aggregate column can be mapped the same way as single-element columns.