Establishing a Connection
Enabling SSIS in Visual Studio 2022
If you're using Visual Studio 2022, you will need to install the SQL Server Integration Services Projects extension to use SSIS.
- Navigate to Extensions > Manage Extensions.
- In the Manage Extensions window's search box, search for "SQL Server Integration Services Projects 2022" and select the extension in the list.
- Click Download.
- Close Visual Studio and run the downloaded Microsoft.DataTools.IntegrationServices.exe installer. Proceed through the installer with default settings.
- Open Visual Studio. There should now be an "Integration Services Project" project template available.
Adding the Databricks Connection Manager
Create a new connection manager as follows:
- Create a Visual Studio project with the "Integration Services Project" template.
- In the project, right-click within the Connection Managers window and select New Connection from the menu.
- In the Description column, select CData Databricks Connection Manager and click Add...
- Configure the component as described in the next section.
Alternatively, if you have an existing project and CData Databricks Source or CData Databricks Destination:
- Right-click your CData Databricks source or destination component in your data flow
- Select Edit... to open an editor window.
- Click the New... button next to the Connection manager: dropdown selector to create a connection manager.
- Configure the component as described in the next section.
Connecting to Databricks
To connect to a Databricks cluster, set the following properties:
- Database: The name of the Databricks database.
- Server: The Server Hostname of your Databricks cluster.
- HTTPPath: The HTTP Path of your Databricks cluster.
- Token: Your personal access token. You can obtain this value by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab.
You can find the required values in your Databricks instance by navigating to Clusters and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.
Configuring Cloud Storage
The component supports DBFS, Azure Blob Storage, and AWS S3 for uploading CSV files.
DBFS Cloud Storage
To use DBFS for cloud storage, set the CloudStorageType property to DBFS.
Azure Blob Storage
Set the following properties:
- CloudStorageType: Azure Blob storage.
- StoreTableInCloud: True to store tables in cloud storage when creating a new table.
- AzureStorageAccount: The name of your Azure storage account.
- AzureAccessKey: The storage key associated with your Databricks account. Find this via the azure portal (using the root account). Select your storage account and click Access Keys to find this value.
- AzureBlobContainer: Set to the name of you Azure Blob storage container.
AWS S3 Storage
Set the following properties:
- CloudStorageType: AWS S3.
- StoreTableInCloud: True to store tables in cloud storage when creating a new table.
- AWSAccessKey: The AWS account access key. You can acquire this value from your AWS security credentials page.
- AWSSecretKey: Your AWS account secret key. You can acquire this value from your AWS security credentials page.
- AWSS3Bucket: The name of your AWS S3 bucket.
- AWSRegion: The hosting region for your Amazon Web Services. You can obtain the AWS Region value by navigating to the Buckets List page of your Amazon S3 service, for example, us-east-1.
Authenticating to Databricks
CData supports the following authentication schemes:- Basic
- Personal Access Token
- Azure Active Directory (AD)
- Azure Service Principal
Basic
Basic authentication requires a username and password. Set the following:- AuthScheme: Basic.
- User: Your username. This overrides the default value ("Token").
- Token: Your password.
Personal Access Token
To authenticate, set the following:
- AuthScheme: PersonalAccessToken.
- Token: The token used to access the Databricks server. It can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab.
Azure Active Directory
To authenticate, follow these steps:- Register an application with the AzureAD (now known as Microsoft Entra ID) endpoint in the Azure portal. See Configure an app in Azure portal for information on how to create and register the application. Alternatively, you can use a AzureAD application that is already registered.
- Set these properties:
- AuthScheme: AzureAD.
- AzureTenant: The "Directory(tenant) ID" in the AzureAD application "Overview" page
- OAuthClientId: The "Application(client) ID" in the AzureAD application "Overview" page.
- CallbackURL: The "Redirect URIs" in AzureAD application "Authentication" page
- When connecting, a web page opens that prompts you to authenticate. After successful authentication, the connection is established.
Here is an example of the connection string:
"Server=https://adb-8439982502599436.16.azuredatabricks.net;HTTPPath=sql/protocolv1/o/8439982502599436/0810-011933-odsz4s3r;database=default; AuthScheme=AzureAD;InitiateOAuth=GETANDREFRESH;AzureTenant=94be69e7-edb4-4fda-ab12-95bfc22b232f;OAuthClientId=f544a825-9b69-43d9-bec2-3e99727a1669;CallbackURL=http://localhost;"
Azure AD Service Principal
To authenticate, set the following properties:- AuthScheme: AzureServicePrincipal.
- AzureTenantId: The tenant ID of your Microsoft Azure Active Directory.
- AzureClientId: The application (client) ID of your Microsoft Azure Active Directory application.
- AzureClientSecret: The application (client) secret of your Microsoft Azure Active Directory application.