Azure Data Lake Storage

Version 25.3.9396


Azure Data Lake Storage


You can use the Azure Data Lake Storage connector from the CData Sync application to capture data from Azure Data Lake Storage and move it to any supported destination. To do so, you need to add the connector, authenticate to the connector, and complete your connection.

Supported File Formats

When Sync writes data to Azure Data Lake Storage, you can choose the file format for the exported data. The following file formats are supported for the Azure Data Lake Storage destination:

  • (Default) CSV—Plain text comma-separated values.

  • Avro—A row-based binary format that supports schema evolution.

  • Parquet—A columnar storage format that is optimized for analytics.

Add the Azure Data Lake Storage Connector

To enable Sync to use data from Azure Data Lake Storage, you first must add the connector, as follows:

  1. Open the Connections page of the Sync dashboard.

  2. Click Add Connection to open the Select Connectors page.

  3. Click the Sources tab and locate the Azure Data Lake Storage row.

  4. Click the Configure Connection icon at the end of that row to open the New Connection page. If the Configure Connection icon is not available, click the Download Connector icon to install the Azure Data Lake Storage connector. For more information about installing new connectors, see Connections.

Authenticate to Azure Data Lake Storage

After you add the connector, you need to set the required properties.

  • Connection Name: Enter a connection name of your choice.

  • File Format: Select the file format that you want to use: CSV (default), Avro, and Parquet.

    Note: Although the Delta Parquet file format is listed in the available file formats, Sync does not support delta file formats for sources. This option appears in the UI for file-based source connectors only because the delta file format cannot be restricted based on whether the connector is a source or a destination.

  • Azure Storage Account: Enter the name of your Azure storage account.

  • URI: Enter the path of the file system and folder that contains your files (for example, abfss://MyFileSystem/FolderName).

CData Sync supports authenticating to Azure Data Lake Storage in several ways. Select your authentication method below to proceed to the relevant section that contains the authentication details.

Azure Active Directory

To connect with an Azure Active Directory (AD) user account, specify the following properties:

  • Auth Scheme: Select AzureAD.

  • Use Lake Formation: Select True if you want the AWS Lake Formation service to retrieve temporary credentials. These temporary credentials enforce access policies against the user based on the configured IAM role. You can use this service when you authenticate through AzureAD, Okta, ADFS, and PingFederate, while providing a Security Assertion Markup Language (SAML) assertion. The default setting for Use Lake Formation is False.

Azure Managed Service Identity

To leverage Azure Managed Service Identity (MSI) when CData Sync is running on an Azure virtual machine, select Azure MSI for Auth Scheme. No additional properties are required.

Azure Service Principal

To connect with an Azure service principal and client secret, set the following properties:

  • Auth Scheme: Select AzureServicePrincipal.

  • Azure Tenant: Enter the Microsoft Online tenant to which you want to connect.

  • OAuth Client Id: Enter the client Id that you were assigned when you registered your application with an OAuth authorization server.

  • OAuth Client Secret: Enter the client secret that you were assigned when you registered your application with an OAuth authorization server.

  • (Optional) Scope: Specify the scope of your access to the application.

  • (Optional) OAuth Authorization URL: Enter the OAuth authorization URL for the OAuth service.

  • (Optional) OAuth Access Token URL: Enter the URL from which to retrieve the access token.

  • (Optional) OAuth Refresh Token URL: Enter the URL from which to refresh the OAuth token.

To obtain the OAuth client Id and client secret for your application:

  1. Log in to the Azure portal.

  2. In the left navigation pane, select All services. Then, search for and select App registrations.

  3. Click New registrations.

  4. Enter an application name and select Any Azure AD Directory - Multi Tenant.

  5. After you create the application, copy the application (client) Id value that is displayed in the Overview section. Use this value as the OAuth client Id.

  6. Navigate to the Certificates & Secrets section and select New Client Secret for the application.

  7. Specify the duration and save the client secret. After you save it, the key value is displayed.

  8. Copy this value because it is displayed only once. You will use this value as the OAuth client secret.

  9. On the Authentication tab, make sure to select Access tokens (used for implicit flows).

Azure Service Principal Certificate

To connect with an Azure service principal and client certificate, set the following properties:

  • Auth Scheme: Select AzureServicePrincipalCert.

  • Azure Tenant: Enter the Microsoft Online tenant to which you want to connect.

  • OAuth JWT Cert – Enter your Java web tokens (JWT) certificate store.

  • OAuth JWT Cert Type – Enter the type of key store that contains your JWT Certificate. The default type is PEMKEY_BLOB.

  • OAuth Client Id - Enter the client Id that you were assigned when you registered your application with an OAuth authorization server.

  • (Optional) Scope: Specify the scope of your access to the application.

  • (Optional) OAuth Authorization URL: Enter the OAuth authorization URL for the OAuth service.

  • (Optional) OAuth Access Token URL: Enter the URL from which to retrieve the access token.

  • (Optional) OAuth Refresh Token URL: Enter the URL from which to refresh the OAuth token.

  • (Optional) OAuth JWT Cert Password: Enter the password for your OAuth JWT certificate.

  • (Optional) OAuth JWT Cert Subject: Enter the subject of your OAuth JWT certificate.

To obtain the OAuth certificate for your application:

  1. Log in to the Azure portal.

  2. In the left navigation pane, select All services. Then, search for and select App registrations.

  3. Click New registrations.

  4. Enter an application name and select Any Azure AD Directory - Multi Tenant.

  5. After you create the application, copy the application (client) Id value that is displayed in the Overview section. Use this value as the OAuth client Id.

  6. Navigate to the Certificates & Secrets section and select Upload certificate. Then, select the certificate to upload from your local machine.

  7. Specify the duration and save the client secret. After you save it, the key value is displayed.

  8. Copy this value because it is displayed only once. You will use this value as the OAuth client secret.

  9. On the Authentication tab, make sure to select Access tokens (used for implicit flows).

Azure Access Key

To connect with an Azure access key, set the following properties:

  • Auth Scheme: Select Access Key.

  • Azure Access Key: Enter the access key that is associated with your storage account.

To retrieve your access key:

  1. Sign in to the Azure portal with the credentials for your root account.

  2. Click Storage accounts and select the storage account that you want to use.

  3. Under Settings, click Access keys. Your storage account name and key are displayed on that page.

Azure Shared Access Signature

To connect with an Azure shared access signature, set the following properties:

  • Auth Scheme: Select AzureStorageSAS.

  • Azure Shared Access Signature: Enter the shared access signature that is associated with the storage account.

To create an Azure shared access signature:

  1. Sign in to the Azure portal with the credentials for your root account.

  2. Click Storage accounts and select the storage account you want to use.

  3. Under Settings, click Shared Access Signature.

  4. Set the permissions and a date when the token will expire.

  5. Click Generate SAS and copy the token that is generated.

Complete Your Connection

To complete your connection:

  1. Specify the following properties:

    For the CSV file format:

    • FMT: Enter the format that you want to use to parse all text files. The default format is CsvDelimited

    • Aggregate Files: Specify whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. The default option is False.

    • Include Column Headers: Specify whether you want to obtain column headers from the first lines of the specified files. The default option is True.

    For the Avro and Parquet file formats:

    • Data Model: Select the data model that you want to use to parse documents for your format and to generate the database metadata. The default data model is Document.

    • Aggregate Files: Specify whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. The default option is False.

  2. Define advanced connection settings on the Advanced tab. (In most cases, though, you should not need these settings.)

  3. If you authenticate with AzureAD, click Connect to Azure Data Lake Storage to connect to your Azure Data Lake Storage account.

  4. Click Create & Test to create your connection.