Databricks Connector Setup
Version 22.0.8473
Version 22.0.8473
Databricks Connector Setup
The Databricks connector allows you to integrate Databricks into your data flow by pushing or pulling data from Databricks. Follow the steps below to connect CData Arc to Databricks.
Establish a Connection
To allow Arc to use data from Databricks, you must first establish a connection to Databricks. There are two ways to establish this connection:
- Add a Databricks connector to your flow. Then, in the settings pane, click Create next to the Connection drop-down list.
- Open the Arc Settings page, then open the Connections tab. Click Add, select Databricks, and click Next.
Note:
- The login process is only required the first time the connection is created.
- Connections to Databricks can be re-used across multiple Databricks connectors.
Enter Connection Settings
After opening a new connection dialogue, follow these steps:
- Enter the connection information:
- Name — The static name of the connection. Set this to a name of your choice.
- Type — This is always set to Databricks.
- Auth Scheme — The authentication scheme to use for the connection. Options are Personal Access Token and Azure Service Principal.
- Create the connection according to your selected Auth scheme:
- Personal Access Token — Your Databricks access token.
- Azure Service Principal — Enter the following information:
- AzureTenantId — The tenant Id of your Microsoft Azure Active Directory.
- AzureClientId — The application (client) Id of your Microsoft Azure Active Directory application.
- AzureClientSecret — The application (client) secret of your Microsoft Azure Active Directory application.
- AzureSubscriptionId — The subscription Id of your Microsoft Azure Databricks Service Workspace.
- AzureResourceGroup — The Resource Group name of your Microsoft Azure Databricks Service workspace.
- AzureWorkspace — The name of your Microsoft Azure Databricks Service workspace.
-
If necessary, click Advanced to open the drop-down menu of advanced connection settings. These settings should not be needed in most cases.
-
Click Test Connection to ensure that Arc can connect to Databricks with the provided information. If an error occurs, check all fields and try again.
-
Click Add Connection to finalize the connection.
-
In the Connection drop-down list of the connector configuration pane, select the newly-created connection.
- Click Save Changes.
Select an Action
After establishing a connection to Databricks, you must choose the action that the Databricks connector will perform. The table below outlines each action and where it belongs in an Arc flow.
Action | Description | Position in Flow |
---|---|---|
Upsert | Inserts or updates Databricks data. By default, if a record already exists in Databricks, an update is performed on the existing data in Databricks using the values provided from the input. | End |
Lookup | Retrieves a value from Databricks and inserts that value into an already-existing Arc message in the flow. The Lookup Query determines what value the connector will retrieve from Databricks. It should be formatted as a SQL query against the Databricks tables. |
Middle |
Select | Retrieves data from Databricks and brings it into Arc. You can use the Filter panel to add filters to the Select. These filters function similarly to WHERE clauses in SQL. |
Beginning |
Execute Stored Procedures | Treats data coming into the connector as input for a stored procedure, and then passes the result down the flow. You can click the Show Sample Data button to provide sample inputs to the selected Stored Procedure and preview the results. |
Middle |