Databricks
Version 25.3.9396
Version 25.3.9396
Databricks
You can use the Databricks connector from the CData Sync application to move data from any supported source to the Databricks destination. To do so, you need to add the connector, authenticate to the connector, and complete your connection.
Add the Databricks Connector
To enable Sync to use data from Databricks, you first must add the connector, as follows:
-
Open the Connections page of the Sync dashboard.
-
Click Add Connection to open the Select Connectors page.
-
Click the Destinations tab and locate the Databricks row.
-
Click the Configure Connection icon at the end of that row to open the New Connection page. If the Configure Connection icon is not available, click the Download Connector icon to install the Databricks connector. For more information about installing new connectors, see Connections.
Authenticate to Databricks
After you add the connector, you need to set the required properties.
-
Connection Name - Enter a connection name of your choice.
-
Server - Enter the host name or the IP address of the server that hosts the Databricks database.
-
HTTP Path - Enter the HTTP path of your Databricks cluster.
-
Catalog - Enter the name of your catalog. The default catalog name is hive_metastore.
CData Sync supports authenticating to Databricks in several ways. Select your authentication method below to proceed to the relevant section that contains the authentication details.
- Personal Access Token (default)
- Basic
- User-to-Machine Authentication
- Machine-to-Machine Authentication
- Azure Service Principal
- Azure Active Directory
Personal Access Token
To connect with a personal access token (PAT), set the following properties:
-
Auth Scheme - Select PersonalAccessToken.
-
Token - Enter the PAT that you use to connect to Databricks server.
To obtain your token:
-
In your Azure Databricks workspace, click your username in the top bar and select User Settings > Access Tokens.
-
Click Developer.
-
Click Manage (next to Access tokens). Then, click Generate new token > Generate.
Copy and save your new token because it will not be displayed again.
-
Basic
To connect with your user credentials, set the following properties:
-
Auth Scheme: Select Basic.
-
User - Enter the username that you use to authenticate to your Databricks account.
-
Token - Enter the token that you use to authenticate to your Databricks account.
User-to-Machine Authentication
User-to-machine (U2M) authentication enables you to grant tools or applications (for example, a command-line interface) access to your workspace. To authenticate via with this method, set the following properties:
-
Auth Scheme: Select OAuthU2M.
-
OAuth Level - Select the level from which you want to generate an access token.
-
AccountLevel: A Databricks account represents a single entity that can include multiple workspaces. You can use accounts that are enabled for Unity Catalog to manage users and their access to data centrally across all of the workspaces in the account.
For this level, you need to enter your account identifier (Id) in the Databricks Account Id text box.
-
WorkspaceLevel (default): In Databricks, a workspace is a Databricks deployment in the cloud that functions as an environment for your team to access Databricks assets.
No other settings are necessary for this level.
-
Machine-to-Machine Authentication
Machine-to-machine (M2M) authentication verifies the identity of devices or applications that communicate over a network. To authenticate with this method, set the following properties:
-
Auth Scheme: Select OAuthM2M.
-
OAuth Client Id - Enter the client Id that you were assigned when you registered your application with an OAuth authorization server.
-
OAuth Client Secret - Enter the client secret that you were assigned when you registered your application with an OAuth authorization server.
-
OAuth Level - Select the level from which you want to generate an access token.
-
AccountLevel: A Databricks account represents a single entity that can include multiple workspaces. You can use accounts that are enabled for Unity Catalog to manage users and their access to data centrally across all of the workspaces in the account.
For this level, you need to enter your account Id in the Databricks Account Id text box.
-
WorkspaceLevel (default): In Databricks, a workspace is a Databricks deployment in the cloud that functions as an environment for your team to access Databricks assets.
No other settings are necessary for this level.
-
Azure Service Principal
To connect with an Azure service principal, set the following properties:
-
Auth Scheme - Select AzureServicePrincipal.
-
Azure Tenant Id - Enter the tenant Id for your Microsoft Azure Active Directory application.
-
Azure Client Id - Enter the client Id for your Microsoft Azure Active Directory application.
-
Azure Client Secret - Enter the client secret for your Microsoft Azure Active Directory application.
Azure Active Directory
To connect with an Azure Active Directory user account, set the following properties:
-
Auth Scheme - Select AzureAD.
-
Azure Tenant - Enter the Microsoft Online tenant that you use to access data. If you do not specify a tenant, Sync uses the default tenant.
-
OAuth Client Id - Enter the client Id that you were assigned when you registered your application with an OAuth authorization server.
Complete Your Connection
To complete your connection:
-
(Optional) For Database, enter the name of your Databricks database.
-
Define advanced connection settings on the Advanced tab. (In most cases, though, you should not need these settings.)
-
Click Connect to Databricks to connect to your Databricks account.
-
Click Create & Test to create your connection.
More Information
For more information about interactions between CData Sync and Databricks, see Databricks Connector for CData Sync.