Establishing a Connection
The objects available within our connector are accessible from the "cdata.adls" module. To use the module's objects directly:
- Import the module as follows:
import cdata.adls as mod
- To establish a connection string, call the connect() method from the connector object using an appropriate connection string, such as:
mod.connect("Account=MyStorageAccount;FileSystem=MyBlobContainer;AccessKey=myAccessKey;")
Connecting to CData Python Connector for Azure Data Lake Storage Gen 2
To connect to a Gen 2 DataLakeStorage account, set the following properties:
- Account: The name of the storage account.
- FileSystem: The file system name used for this account. For example, the name of an Azure Blob Container.
- Directory (Optional): The path to the location where the replicated file should be stored. If no path is specified, the file is stored in the root directory.
Authenticating to CData Python Connector for Azure Data Lake Storage Gen 2
CData Python Connector for Azure Data Lake Storage supports four different ways to authenticate: using an AccessKey, using a Shared Access Signature, Azure Active Directory OAuth (AzureAD), and Managed Service Identity (AzureMSI).
Access Key
To connect using an access key, you must first obtain an available access key for the ADLS Gen2 storage account.At the Azure portal:
- Go to your ADLS Gen2 Storage Account.
- Under Settings, select Access keys.
- Copy the value for one of the available access keys to the AccessKey connection property.
When you are ready to connect, set these properties:
- AuthScheme: AccessKey.
- AccessKey: The access key value you just retrieved from the Azure Portal.
Shared Access Signature (SAS)
To connect using a Shared Access Signature, you must first generate one using the Azure Storage Explorer tool.When you are ready to connect, set these properties:
- AuthScheme: SAS.
- SharedAccessSignature: The value of the Shared Access Signature you just generated.
Azure AD
Azure AD is Microsoft’s multi-tenant, cloud-based directory and identity management service. It is user-based authentication that requires that you set AuthScheme to AzureAD.Authentication to Azure AD over a Web application always requires the creation of a custom OAuth application. For details, see Creating an Azure AD Application.
Desktop Applications
CData provides an embedded OAuth application that simplifies connection to Azure AD from a Desktop application.You can also authenticate from a desktop application using a custom OAuth application. (For further information, see Creating an Azure AD Application.) To authenticate via Azure AD, set these parameters:
- AuthScheme: AzureAD.
-
Custom applications only:
- OAuthClientId: The client Id assigned when you registered your custom OAuth application.
- OAuthClientSecret: The client secret assigned when you registered your custom OAuth application.
- CallbackURL: The redirect URI you defined when you registered your custom OAuth application.
When you connect, the connector opens Azure Data Lake Storage's OAuth endpoint in your default browser. Log in and grant permissions to the application.
The connector completes the OAuth process, obtaining an access token from Azure Data Lake Storage and using it to request data. The OAuth values are saved in the path specified in OAuthSettingsLocation. These values persist across connections.
When the access token expires, the connector refreshes it automatically.
Web Applications
To authenticate via Azure AD using a Web application, you must register a custom OAuth application with Azure Data Lake Storage (see Creating an Azure AD Application). You can then use the connector to get and manage the OAuth token values.Get an Azure AD OAuth Access Token
First, set these connection properties to obtain the OAuthAccessToken:
- AuthScheme: AzureAD.
- OAuthClientId: The client Id in your application settings.
- OAuthClientSecret: The client secret in your application settings.
Next, call stored procedures to complete the OAuth exchange:
- Call the GetOAuthAuthorizationURL stored procedure. Set the AuthMode input to WEB and set the CallbackURL input to the Redirect URI you specified in your application settings. If necessary, set the Permissions parameter to request custom permissions.
The stored procedure returns the URL to the OAuth endpoint. - Open the URL, log in, and authorize the application. You are redirected back to the callback URL.
- Call the GetOAuthAccessToken stored procedure. Set the AuthMode input to WEB. Set the Verifier input to the "code" parameter in the query string of the callback URL. If necessary, set the Permissions parameter to request custom permissions.
Once you have obtained the access and refresh tokens, you can connect to data and refresh the Azure AD access token either automatically or manually.
Automatic Refresh of the Azure AD OAuth Access Token
To have the connector automatically refresh the Azure AD OAuth access token, set the following parameters the first time you connect to data:
- AuthScheme: AzureAD.
- InitiateOAuth: REFRESH.
- OAuthClientId: The client Id in your application settings.
- OAuthClientSecret: The client secret in your application settings.
- OAuthAccessToken: The access token returned by GetOAuthAccessToken.
- OAuthRefreshToken: The refresh token returned by GetOAuthAccessToken.
- OAuthSettingsLocation: The location where the driver saves the OAuth token values, which persist across connections.
On subsequent data connections, the values for OAuthAccessToken and OAuthRefreshToken are taken from OAuthSettingsLocation, and do not need to be set on the connection.
Manual Refresh of the Azure AD OAuth Access Token
The only value required to manually refresh the Azure AD OAuth access token when connecting to data is the OAuth refresh token.
Use the RefreshOAuthAccessToken stored procedure to manually refresh the OAuthAccessToken after the ExpiresIn parameter value returned by GetOAuthAccessToken has elapsed, then set these connection properties:
- OAuthClientId: The client Id in your application settings.
- OAuthClientSecret: The client secret in your application settings.
Now call RefreshOAuthAccessToken with OAuthRefreshToken set to the OAuth refresh token returned by GetOAuthAccessToken. After the new tokens have been retrieved, open a new connection by setting OAuthAccessToken to the value returned by RefreshOAuthAccessToken.
Finally, store the OAuth refresh token so that you can use it to manually refresh the OAuth access token after it has expired.
Headless Machines
To configure the driver with a user account on a headless machine, you must authenticate on another device that has an internet browser.
You can do this in either of the following ways:
- Obtain the OAuthVerifier value as described below in Option 1: Obtain and Exchange a Verifier Code.
- Install the connector on another machine as described below in Option 2: Transfer OAuth Settings. After you authenticate via the usual browser-based flow, transfer the OAuth authentication values.
Option 1: Obtain and Exchange a Verifier Code
-
Find the authorization endpoint.
Custom applications only: Set these properties to create the Authorization URL:
- InitiateOAuth: OFF.
- OAuthClientId: The client Id assigned when you registered your application.
- OAuthClientSecret: The client secret assigned when you registered your application.
Custom and embedded applications: Call the GetOAuthAuthorizationURL stored procedure.
- Open the URL returned by the stored procedure in a browser.
- Log in and grant permissions to the connector. You are redirected to the callback URL, which contains the verifier code.
- Save the value of the verifier code. You will use this later to set the OAuthVerifier connection property.
-
Exchange the OAuth verifier code for OAuth refresh and access tokens.
At the headless machine, set these properties:
- AuthScheme: AzureAD.
- InitiateOAuth: REFRESH.
- OAuthVerifier: The verifier code.
- OAuthSettingsLocation: The location of the file that holds the OAuth token values that persist across connections.
-
Custom applications only:
- OAuthClientId: The client Id in your custom OAuth application settings.
- OAuthClientSecret: The client secret in the custom OAuth application settings.
-
After the OAuth settings file is generated, reset the following properties to connect:
- InitiateOAuth: REFRESH.
- OAuthSettingsLocation: The location containing the encrypted OAuth authentication values. Make sure this location grants read and write permissions to the connector to enable the automatic refreshing of the access token.
-
Custom applications only:
- OAuthClientId: The client Id assigned when you registered your application.
- OAuthClientSecret: The client secret assigned when you registered your application.
Option 2: Transfer OAuth Settings
Before you can connect via a headless machine, you must create and install a connection with the driver on a device that supports an internet browser. Set the connection properties as described above, in Desktop Applications.
After you complete the instructions in Desktop Applications, the resulting authentication values are encrypted and written to the location specified by OAuthSettingsLocation. The default filename is OAuthSettings.txt.
Once you have successfully tested the connection, copy the OAuth settings file to your headless machine.
At the headless machine, set these properties:
- AuthScheme: AzureAD.
- InitiateOAuth: REFRESH.
- OAuthSettingsLocation: The location of your OAuth settings file. Make sure this location gives read and write permissions to the connector to enable the automatic refreshing of the access token.
-
Custom applications only:
- OAuthClientId: The client Id assigned when you registered your application.
- OAuthClientSecret: The client secret assigned when you registered your application.
Managed Service Identity (MSI)
If you are running Azure Data Lake Storage on an Azure VM and want to leverage MSI to connect, set AuthScheme to AzureMSI.
User-Managed Identities
To obtain a token for a managed identity, use the OAuthClientId property to specify the managed identity's "client_id".When your VM has multiple user-assigned managed identities, you must also specify OAuthClientId.