The CData Sync App provides a straightforward way to continuously pipeline your Parquet data to any database, data lake, or data warehouse, making it easily available for Analytics, Reporting, AI, and Machine Learning.
The Parquet connector can be used from the CData Sync application to pull data from Parquet and move it to any of the supported destinations.
The Sync App leverages the Apache Parquet API V2.0. The Sync App supports following compression encodings when parsing Parquet files: ZSTD, Gzip, Snappy, uncompressed.
The CData Sync App is designed for streaming Parquet only.
This streamed file content does not include all of the metadata associated with remotely stored Parquet files, such as file and folder name.
If access to both the file metadata and the actual file content is needed, then the CData Sync App must be used in tandem with the associated file system driver(s) for the service the Parquet files are remotely stored in.
The following file system drivers are available:
See the relevant CData file system driver's documentation for a configuration guide for connecting to stored Parquet file metadata.
For required properties, see the Settings tab.
For connection properties that are not typically required, see the Advanced tab.
The CData Sync App allows connecting to local and remote Parquet resources. Set the URI property to the Parquet resource location, in addition to any other properties necessary to connect to your data source.
Set the ConnectionType to Local. Local files support SELECT\INSERT.
Set the URI to a folder containing Parquet files: C:\folder1.
You can also connect to multiple Parquet files which share the same schema. Below is an example connection string:
URI=C:\folder; AggregateFiles=True;
If you would prefer to expose all of the individual Parquet files as tables instead, leave this property False.
URI=C:\folder; AggregateFiles=False;
If you need INSERT/UPDATE/DELETE cloud files, you can download the corresponding CData Sync App for that cloud host (supported via stored procedures), make changes with the local file's corresponding Sync App, then upload the file using the cloud source's stored procedures.
As an example, if you wanted to update a file stored on SharePoint, you could use the CData SharePoint Sync App's DownloadDocument procedure to download the Parquet file, update the local Parquet file with the CData Parquet Sync App, then use the SharePoint Sync App's UploadDocument procedure to upload the changed file to SharePoint.
A unique prefix at the beginning of the URI connection property is used to identify the cloud data store being targed by the Sync App and the remainder of the path is a relative path to the desired folder (one table per file) or single file (a single table).
Set the following to identify your Parquet resources stored on Amazon S3:
See Connecting to Amazon S3 for more information regarding how to connect and authenticate to Parquet files hosted on Amazon S3.
Set the following to identify your Parquet resources stored on Azure Blob Storage:
See Connecting to Azure Blob Storage for more information regarding how to connect and authenticate to Parquet files hosted on Amazon Blob Storage.
Set the following to identify your Parquet resources stored on Azure Data Lake Storage:
See Connecting to Azure Data Lake Storage for more information regarding how to connect and authenticate to Parquet files hosted on Azure Data Lake Storage.
Set the following properties to connect:
You can authenticate either an Azure access key or an Azure shared access signature. Set one of the following:
Set the following to identify your Parquet resources stored on Box:
See Connecting to Box for more information regarding how to connect and authenticate to Parquet files hosted on Box.
Set the following to identify your Parquet resources stored on Dropbox:
See Connecting to Dropbox for more information regarding how to connect and authenticate to Parquet files hosted on Dropbox.
The Sync App supports both plaintext and SSL/TLS connections to FTP servers.
Set the following connection properties to connect:
Set the following to identify your Parquet resources stored on Google Cloud Storage:
See Connecting to Google Cloud Storage for more information regarding how to connect and authenticate to Parquet files hosted on Google Cloud Storage.
Set the following to identify your Parquet resources stored on Google Drive:
See Connecting to Google Drive for more information regarding how to connect and authenticate to Parquet files hosted on Google Drive.
Set the following to identify your Parquet resources stored on HDFS:
There are two authentication methods available for connecting to HDFS data source, Anonymous Authentication and Negotiate (Kerberos) Authentication.
Anonymous Authentication
In some situations, you can connect to HDFS without any authentication connection properties. To do so, set the AuthScheme property to None (default).
Authenticate using Kerberos
When authentication credentials are required, you can use Kerberos for authentication. See Using Kerberos for details on how to authenticate with Kerberos.
Set the following to identify your Parquet resources stored on HTTP streams:
See Connecting to HTTP Streams for more information regarding how to connect and authenticate to Parquet files hosted on HTTP Streams.
Set the following to identify your Parquet resources stored on IBM Cloud Object Storage:
See Connecting to IBM Object Storage for more information regarding how to connect and authenticate to Parquet files hosted on IBM Cloud Object Storage.
Set the following to identify your Parquet resources stored on OneDrive:
See Connecting to OneDrive for more information regarding how to connect and authenticate to Parquet files hosted on OneDrive.
Set the following properties to authenticate with HMAC:
Set the following to identify your Parquet resources stored on SFTP:
See Connecting to SFTP for more information regarding how to connect and authenticate to Parquet files hosted on SFTP.
Set the following to identify your Parquet resources stored on SharePoint Online:
See Connecting to SharePoint Online for more information regarding how to connect and authenticate to Parquet files hosted on SharePoint Online.
Specify the following to connect to data:
There are several authentication methods available for connecting to Parquet including:
To authenticate using account root credentials, set these configuration parameters:
Note: Use of this authentication scheme is discouraged by Amazon for anything but simple tests. The account root credentials have the full permissions of the user, making this the least secure authentication method.
Set AuthScheme to AwsEC2Roles.
If you are using the Sync App from an EC2 Instance and have an IAM Role assigned to the instance, you can use the IAM Role to authenticate. Since the Sync App automatically obtains your IAM Role credentials and authenticates with them, it is not necessary to specify AWSAccessKey and AWSSecretKey.
If you are also using an IAM role to authenticate, you must additionally specify the following:
The Parquet Sync App now supports IMDSv2. Unlike IMDSv1, the new version requires an authentication token. Endpoints and response are the same in both versions.
In IMDSv2, the Parquet Sync App first attempts to retrieve the IMDSv2 metadata token and then uses it to call AWS metadata endpoints. If it is unable to retrieve the token, the Sync App reverts to IMDSv1.
Set AuthScheme to AwsIAMRoles.
In many situations, it may be preferable to use an IAM role for authentication instead of the direct security credentials of an AWS root user. If you are specifying the AWSAccessKey and AWSSecretKey of an AWS root user, you may not use roles.
To authenticate as an AWS role, set these properties:
To connect to ADFS, set the AuthScheme to ADFS, and set these properties:
To authenticate to ADFS, set these SSOProperties:
Example connection string:
AuthScheme=ADFS;User=username;Password=password;SSOLoginURL='https://sts.company.com';SSOProperties='RelyingParty=https://saml.salesforce.com';
The ADFS Integrated flow indicates you are connecting with the currently logged in Windows user credentials. To use the ADFS Integrated flow, do not specify the User and Password, but otherwise follow the same steps in the ADFS guide above.
To connect to Okta, set the AuthScheme to Okta, and set these properties:
If you are using a trusted application or proxy that overrides the Okta client request OR configuring MFA, you must use combinations of SSOProperties to authenticate using Okta. Set any of the following, as applicable:
Example connection string:
AuthScheme=Okta;SSOLoginURL='https://example.okta.com/home/appType/0bg4ivz6cJRZgCz5d6/46';User=oktaUserName;Password=oktaPassword;
To connect to PingFederate, set AuthScheme to PingFederate, and set these properties:
To enable mutual SSL authentication for SSOLoginURL, the WS-Trust STS endpoint, configure these SSOProperties:
Example connection string:
authScheme=pingfederate;SSOLoginURL=https://mycustomserver.com:9033/idp/sts.wst;SSOExchangeUrl=https://us-east-1.signin.aws.amazon.com/platform/saml/acs/764ef411-xxxxxx;user=admin;password=PassValue;AWSPrincipalARN=arn:aws:iam::215338515180:saml-provider/pingFederate;AWSRoleArn=arn:aws:iam::215338515180:role/SSOTest2;
For users and roles that require multi-factor authentication, specify the following:
Note: If you want to control the duration of the temporary credentials, set the TemporaryTokenDuration property (default: 3600 seconds).
To authenticate using temporary credentials, specify the following:
The Sync App can now request resources using the same permissions provided by long-term credentials (such as IAM user credentials) for the lifespan of the temporary credentials.
To authenticate using both temporary credentials and an IAM role, set all the parameters described above, and specify these additional parameters:
You can use a credentials file to authenticate. Any configurations related to AccessKey/SecretKey authentication, temporary credentials, role authentication, or MFA can be used. To do so, set the following properties to authenticate:
This configuration requires two separate Azure AD applications:
To connect to Azure AD, set the AuthScheme to AzureAD, and set these properties:
To authenticate to Azure AD, set these SSOProperties:
Example connection string:
AuthScheme=AzureAD;OAuthClientId=3ea1c786-d527-4399-8c3b-2e3696ae4b48;OauthClientSecret=xxx;CallbackUrl=https://localhost:33333;SSOProperties='Resource=https://signin.aws.amazon.com/saml;AzureTenant=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';
To obtain the credentials for an AzureBlob user, follow the steps below:
Set AzureStorageAccount to your Azure Blob Storage account name.
You can authenticate to Azure Blob Storage via Access Key, Shared Access Signatures (SAS), AzureAD user, Azure MSI, or Azure Service Principal.
Set the following to authenticate with an Azure Access Key:
AuthScheme must be set to AzureAD in all user account flows.
The authentication as an Azure Service Principal is handled via the OAuth Client Credentials flow. It does not involve direct user authentication. Instead, credentials are created for just the application itself. All tasks taken by the app are done without a default user context, but based on the assigned roles. The application access to the resources is controlled through the assigned roles' permissions.
Create an AzureAD App and an Azure Service Principal
When authenticating using an Azure Service Principal, you must create and register an Azure AD application with an Azure AD tenant. See Creating a Custom OAuth Application for more details.
In your App Registration in portal.azure.com, navigate to API Permissions and select the Microsoft Graph permissions. There are two distinct sets of permissions: Delegated permissions and Application permissions. The permissions used during client credential authentication are under Application Permissions.
Assign a role to the application
To access resources in your subscription, you must assign a role to the application.
Client Secret
Set these connection properties:
Certificate
Set these connection properties:
You are now ready to connect. Authentication with client credentials takes place automatically like any other connection, except there is no window opened prompting the user. Because there is no user context, there is no need for a browser popup. Connections take place and are handled internally.
If you are connecting from an Azure VM with permissions for Azure Data Lake Storage, set AuthScheme to AzureMSI.
If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate. Set the following to authenticate:
There are two types of custom AzureAD applications: AzureAD and AzureAD with an Azure Service Principal. Both are OAuth-based.
You may choose to use your own AzureAD Application Credentials when you want to
Follow the steps below to obtain the AzureAD values for your application, the OAuthClientId and OAuthClientSecret.
When authenticating using an Azure Service Principal, you must create both a custom AzureAD application and a service principal that can access the necessary resources. Follow the steps below to create a custom AzureAD application and obtain the connection properties for Azure Service Principal authentication.
Follow the steps below to obtain the AzureAD values for your application.
Set AzureStorageAccount to your Azure Data Lake Storage account name.
You can authenticate to Azure Data Lake Storage via Access Key, Shared Access Signature (SAS), AzureAD user, Azure MSI, or Azure Service Principal.
Set the following to authenticate with an Azure Access Key:
AuthScheme must be set to AzureAD in all user account flows.
The authentication as an Azure Service Principal is handled via the OAuth Client Credentials flow. It does not involve direct user authentication. Instead, credentials are created for just the application itself. All tasks taken by the app are done without a default user context, but based on the assigned roles. The application access to the resources is controlled through the assigned roles' permissions.
Create an AzureAD App and an Azure Service Principal
When authenticating using an Azure Service Principal, you must create and register an Azure AD application with an Azure AD tenant. See Creating a Custom OAuth Application for more details.
In your App Registration in portal.azure.com, navigate to API Permissions and select the Microsoft Graph permissions. There are two distinct sets of permissions: Delegated permissions and Application permissions. The permissions used during client credential authentication are under Application Permissions.
Assign a role to the application
To access resources in your subscription, you must assign a role to the application.
Client Secret
Set these connection properties:
Certificate
Set these connection properties:
You are now ready to connect. Authentication with client credentials takes place automatically like any other connection, except there is no window opened prompting the user. Because there is no user context, there is no need for a browser popup. Connections take place and are handled internally.
If you are connecting from an Azure VM with permissions for Azure Data Lake Storage, set AuthScheme to AzureMSI.
If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate. Set the following to authenticate:
There are two types of custom AzureAD applications: AzureAD and AzureAD with an Azure Service Principal. Both are OAuth-based.
You may choose to use your own AzureAD Application Credentials when you want to
Follow the steps below to obtain the AzureAD values for your application, the OAuthClientId and OAuthClientSecret.
When authenticating using an Azure Service Principal, you must create both a custom AzureAD application and a service principal that can access the necessary resources. Follow the steps below to create a custom AzureAD application and obtain the connection properties for Azure Service Principal authentication.
Follow the steps below to obtain the AzureAD values for your application.
Use the OAuth authentication standard to connect to Box. You can authenticate with a user account or with a service account. A service account is required to grant organization-wide access scopes to the Sync App. The Sync App facilitates these authentication flows as described below.
AuthScheme must be set to OAuth in all user account flows.
Set the AuthScheme to OAuthJWT to authenticate with this method.
Service accounts have silent authentication, without user authentication in the browser. You can also use a service account to delegate enterprise-wide access scopes to the Sync App.
You need to create an OAuth application in this flow. See Create a Custom OAuth App to create and authorize an app. You can then connect to Box data that the service account has permission to access.
After setting the following connection properties, you are ready to connect:
You may choose to use your own OAuth Application Credentials when you want to:
At the Box Enterprise Developer Console:
Note: Box does not back up private keys for security reasons. Be careful to back up the Public/Private JSON file. If you lose your private key, you must reset the entire keypair.
openssl genrsa -des3 -out private.pem 2048 openssl rsa -in private.pem -outform PEM -pubout -out public.pem
Note: To run OpenSSL in a Windows environment, install the Cygwin package.
After your application is created and registered, click Configuration from the main menu to access your settings. Note the displayed Redirect URI, Client ID, and Client Secret. You will need these values later.
If you change the JWT access scopes, you must reauthorize the application in the enterprise admin console:
Dropbox uses the OAuth authentication standard.
You need to choose between using CData's embedded OAuth app or Create a Custom OAuth App.
The embedded app includes the following scopes:
You may choose to use your own OAuth Application Credentials when you want to
No further values need to be specified in the Parquet app settings.
Set the ProjectId property to the Id of the project you want to connect to.
The Sync App supports using user accounts and GCP instance accounts for authentication.
The following sections discuss the available authentication schemes for Google Cloud Storage:
AuthScheme must be set to OAuth in all user account flows.
Get an OAuth Access Token
Set the following connection properties to obtain the OAuthAccessToken:
Then call stored procedures to complete the OAuth exchange:
Once you have obtained the access and refresh tokens, you can connect to data and refresh the OAuth access token either automatically or manually.
Automatic Refresh of the OAuth Access Token
To have the driver automatically refresh the OAuth access token, set the following on the first data connection:
Manual Refresh of the OAuth Access Token
The only value needed to manually refresh the OAuth access token when connecting to data is the OAuth refresh token.
Use the RefreshOAuthAccessToken stored procedure to manually refresh the OAuthAccessToken after the ExpiresIn parameter value returned by GetOAuthAccessToken has elapsed, then set the following connection properties:
Then call RefreshOAuthAccessToken with OAuthRefreshToken set to the OAuth refresh token returned by GetOAuthAccessToken. After the new tokens have been retrieved, open a new connection by setting the OAuthAccessToken property to the value returned by RefreshOAuthAccessToken.
Finally, store the OAuth refresh token so that you can use it to manually refresh the OAuth access token after it has expired.
Option 1: Obtain and Exchange a Verifier Code
To obtain a verifier code, you must authenticate at the OAuth authorization URL.
Follow the steps below to authenticate from the machine with an internet browser and obtain the OAuthVerifier connection property.
On the headless machine, set the following connection properties to obtain the OAuth authentication values:
After the OAuth settings file is generated, you need to re-set the following properties to connect:
Option 2: Transfer OAuth Settings
Prior to connecting on a headless machine, you need to create and install a connection with the driver on a device that supports an internet browser. Set the connection properties as described in "Desktop Applications" above.
After completing the instructions in "Desktop Applications", the resulting authentication values are encrypted and written to the location specified by OAuthSettingsLocation. The default filename is OAuthSettings.txt.
Once you have successfully tested the connection, copy the OAuth settings file to your headless machine.
On the headless machine, set the following connection properties to connect to data:
When running on a GCP virtual machine, the Sync App can authenticate using a service account tied to the virtual machine. To use this mode, set AuthScheme to GCPInstanceAccount.
(For information on getting and setting the OAuthAccessToken and other configuration parameters, see the Desktop Authentication section of "Connecting to Parquet".)
However, you must create a custom OAuth application to connect to Parquet via the Web. And since custom OAuth applications seamlessly support all three commonly-used auth flows, you might want to create custom OAuth applications (use your own OAuth Application Credentials) for those auth flows anyway.
Custom OAuth applications are useful if you want to:
The following sections describe how to enable the Directory API and create custom OAuth applications for user accounts (OAuth) and Service Accounts (OAuth/JWT).
For users whose AuthScheme is OAuth and who need to authenticate over a web application, you must always create a custom OAuth application. (For desktop and headless flows, creating a custom OAuth application is optional.)
Do the following:
Note: The client secret remains accessible from from the Google Cloud Console.
To create a new service account:
To complete the service account flow, generate a private key in the Google Cloud Console. In the service account flow, the driver exchanges a JSON Web token (JWT) for the OAuthAccessToken. The private key is required to sign the JWT. The driver will have the same permissions granted to the service account.
The Sync App supports using user accounts and GCP instance accounts for authentication.
The following sections discuss the available authentication schemes for Google Drive:
AuthScheme must be set to OAuth in all user account flows.
When running on a GCP virtual machine, the Sync App can authenticate using a service account tied to the virtual machine. To use this mode, set AuthScheme to GCPInstanceAccount.
(For information on getting and setting the OAuthAccessToken and other configuration parameters, see the Desktop Authentication section of "Connecting to Parquet".)
However, you must create a custom OAuth application to connect to Parquet via the Web. And since custom OAuth applications seamlessly support all three commonly-used auth flows, you might want to create custom OAuth applications (use your own OAuth Application Credentials) for those auth flows anyway.
Custom OAuth applications are useful if you want to:
The following sections describe how to enable the Directory API and create custom OAuth applications for user accounts (OAuth) and Service Accounts (OAuth/JWT).
For users whose AuthScheme is OAuth and who need to authenticate over a web application, you must always create a custom OAuth application. (For desktop and headless flows, creating a custom OAuth application is optional.)
Do the following:
Note: The client secret remains accessible from from the Google Cloud Console.
To create a new service account:
To complete the service account flow, generate a private key in the Google Cloud Console. In the service account flow, the driver exchanges a JSON Web token (JWT) for the OAuthAccessToken. The private key is required to sign the JWT. The driver will have the same permissions granted to the service account.
The Sync App generically supports connecting to Parquet data stored on HTTP(S) streams.
Several authentication methods, such as user/password, digest access, OAuth, OAuthJWT, and OAuth PASSWORD flow are supported.
You can also connect to streams that have no authentication set up.
Connect to an HTTP(S) stream with no authentication by setting the AuthScheme connection property to None.
Set the following to connect:
Set the following to connect:
OAuth requires the authenticating user to interact with Parquet using the browser. The Sync App facilitates this in various ways as described in the following sections.
Before following the procedures below, you need to register an OAuth app with the service containing the Parquet data you want to work with.
Creating a custom application in most services requires registering as a developer and creating an app in the UI of the service.
This is not necessarily true for all services. In some you must contact the service provider to create the app for you. However it is done, you must obtain the values for OAuthClientId, OAuthClientSecret, and CallbackURL.
Set AuthScheme to OAuthJWT.
The Sync App supports using JWT as an authorization grant in situations where a user cannot perform an interactive sign-on. After setting the following connection properties, you are ready to connect:
Note that the JWT signature algorithm cannot be set directly. The Sync App only supports the RS256 algorithm.
The Sync App will then construct a JWT including the following fields, and submit it to OAuthAccessTokenURL for an access token.
AuthScheme: Set this to OAuthPassword.
OAuth requires the authenticating user to interact with Parquet using the browser. The Sync App facilitates this in various ways as described in the following sections.
Before following the procedures below, you need to register an OAuth app with the service containing the Parquet data you want to work with.
Creating a custom application in most services requires registering as a developer and creating an app in the UI of the service.
This is not necessarily true for all services. In some you must contact the service provider to create the app for you. However it is done, you must obtain the values for OAuthClientId, OAuthClientSecret, and CallbackURL.
After setting the following connection properties, you are ready to connect:
If you do not already have Cloud Object Storage in your IBM Cloud account, you can follow the procedure below to install an instance of SQL Query in your account:
To connect with IBM Cloud Object Storage, you will need an ApiKey. You can obtain this as follows:
Set Region to to your IBM instance region.
You can authenticate to IBM Cloud Object Storage using either HMAC or OAuth authentication.
Set the following properties to authenticate:
ConnectionType=IBM Object Storage Source;URI=ibmobjectstorage://bucket1/folder1; AccessKey=token1; SecretKey=secret1; Region=eu-gb;
Set the following to authenticate using OAuth authentication.
ConnectionType=IBM Object Storage Source;URI=ibmobjectstorage://bucket1/folder1; ApiKey=key1; Region=eu-gb; AuthScheme=OAuth; InitiateOAuth=GETANDREFRESH;
When you connect, the Sync App completes the OAuth process.
You can connect to OneDrive using an AzureAD user, with MSI authentication, or using an Azure Service Principal.
AuthScheme must be set to AzureAD in all user account flows.
The authentication as an Azure Service Principal is handled via the OAuth Client Credentials flow. It does not involve direct user authentication. Instead, credentials are created for just the application itself. All tasks taken by the app are done without a default user context, but based on the assigned roles. The application access to the resources is controlled through the assigned roles' permissions.
Create an AzureAD App and an Azure Service Principal
When authenticating using an Azure Service Principal, you must create and register an Azure AD application with an Azure AD tenant. See Creating a Custom OAuth Application for more details.
In your App Registration in portal.azure.com, navigate to API Permissions and select the Microsoft Graph permissions. There are two distinct sets of permissions: Delegated permissions and Application permissions. The permissions used during client credential authentication are under Application Permissions.
Assign a role to the application
To access resources in your subscription, you must assign a role to the application.
Client Secret
Set these connection properties:
Certificate
Set these connection properties:
You are now ready to connect. Authentication with client credentials takes place automatically like any other connection, except there is no window opened prompting the user. Because there is no user context, there is no need for a browser popup. Connections take place and are handled internally.
If you are connecting from an Azure VM with permissions for Azure Data Lake Storage, set AuthScheme to AzureMSI.
If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate. Set the following to authenticate:
There are two types of custom AzureAD applications: AzureAD and AzureAD with an Azure Service Principal. Both are OAuth-based.
You may choose to use your own AzureAD Application Credentials when you want to
Follow the steps below to obtain the AzureAD values for your application, the OAuthClientId and OAuthClientSecret.
When authenticating using an Azure Service Principal, you must create both a custom AzureAD application and a service principal that can access the necessary resources. Follow the steps below to create a custom AzureAD application and obtain the connection properties for Azure Service Principal authentication.
Follow the steps below to obtain the AzureAD values for your application.
You can authenticate to SFTP using a user and password or an SSH certificate. Additionally, you can connect to an SFTP server that has no authentication enabled.
Set SSHAuthMode to None to connect without authentication, assuming your server supports doing so.
Provide user credentials associated with your SFTP server:
Set the following to connect.
Service provider | Okta | OneLogin | ADFS | AzureAD |
Amazon S3 | Y | Y | Y | |
Azure Blob Storage | ||||
Azure Data Lake Store Gen1 | ||||
Azure Data Lake Store Gen2 | ||||
Azure Data Lake Store Gen2 with SSL | ||||
Google Drive | ||||
OneDrive | ||||
Box | ||||
Dropbox | ||||
SharePoint Online SOAP | Y | Y | Y | |
SharePoint Online REST | ||||
Wasabi | ||||
Google Cloud Storage | ||||
Oracle Cloud Storage | ||||
Azure File |
Azure AD Configuration
The main theme behind this configuration is the OAuth 2.0 On-Behalf-Of flow. It requires two Azure AD applications:
Save the step "Assign the Azure AD test user" until after provisioning so that you can select the AWS roles when assigning the user.
CData Driver Common Properties
The following SSOProperties are needed to authenticate to Azure Active Directory and must be specified for every service provider.
We will retrieve the SSO SAML response from an OAuth 2.0 On-Behalf-Of flow so the following OAuth connection properties must be specified:
Amazon S3
In addition to the common properties, the following properties must be specified when connecting to Amazon S3 service provider:
AuthScheme=AzureAD;InitiateOAuth=GETANDREFRESH;OAuthClientId=d593a1d-ad89-4457-872d-8d7443aaa655;OauthClientSecret=g9-oy5D_rl9YEKfN-45~3Wm8FgVa2F;SSOProperties='Tenant=94be7-edb4-4fda-ab12-95bfc22b232f;Resource=https://signin.aws.amazon.com/saml;';AWSRoleARN=arn:aws:iam::2153385180:role/AWS_AzureAD;AWSPrincipalARN=arn:aws:iam::215515180:saml-provider/AzureAD;
OneLogin Configuration
You must create an application used for the single sign-on process to a specific provider.
Sharepoint SOAP
The following properties must be specified when connecting to Sharepoint SOAP service provider:
AuthScheme='OneLogin';User=test;Password=test;SSOProperties='Domain=test.cdata;';
Okta Configuration
You must create an application used for the single sign-on process to a specific provider.
Sharepoint SOAP
The following properties must be specified when connecting to Sharepoint SOAP service provider:
AuthScheme='Okta';User=test;Password=test;SSOProperties='Domain=test.cdata;';
Amazon S3
The following properties must be specified when connecting to an Amazon S3 service provider:
AuthScheme=Okta;User=OktaUser;Password=OktaPassword;SSOLoginURL='https://{subdomain}.okta.com/home/amazon_aws/0oan2hZLgQiy5d6/272';
ADFS Configuration
You must create an application used for the single sign-on process to a specific provider.
Sharepoint SOAP
The following properties must be specified when connecting to a Sharepoint SOAP service provider:
AuthScheme='ADFS';User=test;Password=test;SSOProperties='Domain=test.cdata;';
Amazon S3
The following properties must be specified when connecting to a Sharepoint SOAP service provider:
AuthScheme=ADFS;User=username;Password=password;SSOLoginURL='https://sts.company.com';ADFS Integrated
The ADFS Integrated flow indicates you are connecting with the currently logged in Windows user credentials. To use the ADFS Integrated flow, simply do not specify the User and Password, but otherwise follow the same steps in the ADFS guide above.
You can use the following properties to gain more control over the data returned from Parquet:
To authenticate to Parquet with Kerberos, set AuthScheme to NEGOTIATE.
Authenticating to Parquet via Kerberos requires you to define authentication properties and to choose how Kerberos should retrieve authentication tickets.
The Sync App provides three ways to retrieve the required Kerberos ticket, depending on whether or not the KRB5CCNAME and/or KerberosKeytabFile variables exist in your environment.
MIT Kerberos Credential Cache File
This option enables you to use the MIT Kerberos Ticket Manager or kinit command to get tickets. With this option there is no need to set the User or Password connection properties.
This option requires that KRB5CCNAME has been created in your system.
To enable ticket retrieval via MIT Cerberos Credential Cache Files:
If the ticket is successfully obtained, the ticket information appears in Kerberos Ticket Manager and is stored in the credential cache file.
The Sync App uses the cache file to obtain the Kerberos ticket to connect to Parquet.
Note: If you would prefer not to edit KRB5CCNAME, you can use the KerberosTicketCache property to set the file path manually. After this is set, the Sync App uses the specified cache file to obtain the Kerberos ticket to connect to Parquet.
Keytab File
If your environment lacks the KRB5CCNAME environment variable, you can retrieve a Kerberos ticket using a Keytab File.
To use this method, set the User property to the desired username, and set the KerberosKeytabFile property to a file path pointing to the keytab file associated with the user.
User and Password
If your environment lacks the KRB5CCNAME environment variable and the KerberosKeytabFile property has not been set, you can retrieve a ticket using a user and password combination.
To use this method, set the User and Password properties to the user/password combination that you use to authenticate with Parquet.
To enable this kind of cross-realm authentication, set the KerberosRealm and KerberosKDC properties to the values required for user authentication. Also, set the KerberosServiceRealm and KerberosServiceKDC properties to the values required to obtain the service ticket.
In this section we will show how to control the various schemes that the Sync App offers to bridge the gap with relational SQL and nested Parquet services. The CData Sync App provides a managed way for you to use the two prevailing techniques for dealing with nested Parquet data:
By default, the Sync App automatically detects the rows in a document, so that you do not need to know the structure of the underlying data to query it with SQL. Set the DataModel property to choose a basic configuration of how the Sync App models object arrays into tables. Set the FlattenObjects and FlattenArrays properties to configure how nested data is flattened into columns. See Parsing Hierarchical Data for a guide.
Below is the raw data used throughout this chapter. The data includes entries for people, the cars they own, and various maintenance services performed on those cars:
{ "people": [ { "personal": { "age": 20, "gender": "M", "name": { "first": "John", "last": "Doe" } }, "vehicles": [ { "type": "car", "model": "Honda Civic", "insurance": { "company": "ABC Insurance", "policy_num": "12345" }, "maintenance": [ { "date": "07-17-2017", "desc": "oil change" }, { "date": "01-03-2018", "desc": "new tires" } ] }, { "type": "truck", "model": "Dodge Ram", "insurance": { "company": "ABC Insurance", "policy_num": "12345" }, "maintenance": [ { "date": "08-27-2017", "desc": "new tires" }, { "date": "01-08-2018", "desc": "oil change" } ] } ], "source": "internet" }, { "personal": { "age": 24, "gender": "F", "name": { "first": "Jane", "last": "Roberts" } }, "vehicles": [ { "type": "car", "model": "Toyota Camry", "insurance": { "company": "Car Insurance", "policy_num": "98765" }, "maintenance": [ { "date": "05-11-2017", "desc": "tires rotated" }, { "date": "11-03-2017", "desc": "oil change" } ] }, { "type": "car", "model": "Honda Accord", "insurance": { "company": "Car Insurance", "policy_num": "98765" }, "maintenance": [ { "date": "10-07-2017", "desc": "new air filter" }, { "date": "01-13-2018", "desc": "new brakes" } ] } ], "source": "phone" } ] }
The Sync App offers three basic configurations to model object arrays as tables, described in the following sections. The Sync App will parse the document and identify the object arrays.
For users who simply need access to the entirety of their Parquet data, flattening the data into a single table is the best option. The Sync App will use streaming and only parses the data once per query in this mode.
With DataModel set to "FlattenedDocuments" values will act in the same manner as a SQL JOIN. Any nested sibling values (child paths at the same height) will be treated as a SQL CROSS JOIN.
Below is a sample query and the results, based on the sample document in Raw Data. This implicitly JOINs the people collection with the vehicles collection and implicitly JOINs the vehicles collection with the maintenance collection.
Use the following connection string to query the Raw Data in this example.
URI=C:\people.parquet;DataModel=FlattenedDocuments;
The following query drills into the nested elements in each people object.
SELECT
[personal.age] AS age,
[personal.gender] AS gender,
[personal.name.first] AS name_first,
[personal.name.last] AS name_last,
[source],
[type],
[model],
[insurance.company] AS ins_company,
[insurance.policy_num] AS ins_policy_num,
[date] AS maint_date,
[desc] AS maint_desc
FROM
[people]
With horizontal and vertical flattening based on the described paths, each vehicle object is implicitly JOINed to its parent people object and each maintenance object is implicitly JOINed to its parent vehicle object.
age | gender | first_name | last_name | source | type | model | ins_company | ins_policy_num | maint_date | maint_desc | |
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2017-07-17 | oil change | |
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2018-01-03 | new tires | |
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2017-08-27 | new tires | |
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2018-01-08 | oil change | |
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-05-11 | tires rotated | |
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-11-03 | oil change | |
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2017-10-07 | new air filter | |
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2018-01-13 | new brakes |
Using a top-level document view of the data provides ready access to top-level elements. The Sync App returns nested elements in aggregate, as single columns.
One aspect to consider is performance. You forego the time and resources to process and parse nested elements -- the Sync App parses the returned data once, using streaming to read the data. Another consideration is your need to access any data stored in nested parent elements, and the ability of your tool or application to process the data.
With DataModel set to "Document" (the default), the Sync App scans only a single object array, the top-level object array by default. The top-level object elements are available as columns due to the default object flattening. Nested object arrays are returned as aggregated strings.
Below is a sample query and the results, based on the sample document in Raw Data. The query results in a single "people" table.
Set the DataModel connection property to "Document" to perform the following query and see the example result set.
URI=C:\people.parquet;DataModel=Document;
The following query pulls the top-level object elements and the vehicles array into the results.
SELECT
[personal.age] AS age,
[personal.gender] AS gender,
[personal.name.first] AS name_first,
[personal.name.last] AS name_last,
[source],
[vehicles]
FROM
[people]
With a document view of the data, the personal object is flattened into 4 columns and the source and vehicles elements are returned as individual columns, resulting in a table with 6 columns.
age | gender | name_first | name_last | source | vehicles | |
20 | M | John | Doe | internet | [{"type":"car","model":"Honda Civic","insurance":{"company":"ABC Insurance","policy_num":"12345"},"maintenance":[{"date":"07-17-2017","desc":"oil change"},{"date":"01-03-2018","desc":"new tires"}]},{"type":"truck","model":"Dodge Ram","insurance":{"company":"ABC Insurance","policy_num":"12345"},"maintenance":[{"date":"08-27-2017","desc":"new tires"},{"date":"01-08-2018","desc":"oil change"}]}]
| |
24 | F | Jane | Roberts | phone | [{"type":"car","model":"Toyota Camry","insurance":{"company":"Car Insurance","policy_num":"98765"},"maintenance":[{"date":"05-11-2017","desc":"tires rotated"},{"date":"11-03-2017","desc":"oil change"}]},{"type":"car","model":"Honda Accord","insurance":{"company":"Car Insurance","policy_num":"98765"},"maintenance":[{"date":"10-07-2017","desc":"new air filter"},{"date":"01-13-2018","desc":"new brakes"}]}]
|
The CData Sync App can be configured to create a relational model of the data, treating nested object arrays as individual tables containing a primary key and a foreign key that links to the parent document. This is particularly useful if you need to work with your data in existing BI, reporting, and ETL tools that expect a relational data model.
With DataModel set to "Relational", any JOINs are controlled by the query. Any time you perform a JOIN query, the file or source will be queried once for each table (nested array) included in the query.
Below is a sample query against the sample document in Raw Data, using a relational model.
URI=C:\people.parquet;DataModel=Relational;'
The following query explicitly JOINs the people, vehicles, and maintenance tables.
SELECT
[people].[personal.age] AS age,
[people].[personal.gender] AS gender,
[people].[personal.name.first] AS first_name,
[people].[personal.name.last] AS last_name,
[people].[source],
[vehicles].[type],
[vehicles].[model],
[vehicles].[insurance.company] AS ins_company,
[vehicles].[insurance.policy_num] AS ins_policy_num,
[maintenance].[date] AS maint_date,
[maintenance].[desc] AS maint_desc
FROM
[people]
JOIN
[vehicles]
ON
[people].[_id] = [vehicles].[people_id]
JOIN
[maintenance]
ON
[vehicles].[_id] = [maintenance].[vehicles_id]
In the example query, each maintenance object is JOINed to its parent vehicle object, which is JOINed to its parent people object to produce a table with 8 rows (2 maintenance entries for each of 2 vehicles each for 2 people).
age | gender | first_name | last_name | source | type | model | ins_company | ins_policy_num | maint_date | maint_desc | ||
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2017-07-17 | oil change | ||
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2018-01-03 | new tires | ||
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2017-08-27 | new tires | ||
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2018-01-08 | oil change | ||
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-05-11 | tires rotated | ||
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-11-03 | oil change | ||
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2017-10-07 | new air filter | ||
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2018-01-13 | new brakes |
This section details a selection of advanced features of the Parquet Sync App.
The Sync App allows you to define virtual tables, called user defined views, whose contents are decided by a pre-configured query. These views are useful when you cannot directly control queries being issued to the drivers. See User Defined Views for an overview of creating and configuring custom views.
Use SSL Configuration to adjust how Sync App handles TLS/SSL certificate negotiations. You can choose from various certificate formats; see the SSLServerCert property under "Connection String Options" for more information.
Configure the Sync App for compliance with Firewall and Proxy, including Windows proxies and HTTP proxies. You can also set up tunnel connections.
The Sync App offloads as much of the SELECT statement processing as possible to Parquet and then processes the rest of the query in memory (client-side).
See Query Processing for more information.
See Logging for an overview of configuration settings that can be used to refine CData logging. For basic logging, you only need to set two connection properties, but there are numerous features that support more refined logging, where you can select subsets of information to be logged using the LogModules connection property.
By default, the Sync App attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store.
To specify another certificate, see the SSLServerCert property for the available formats to do so.
The Parquet Sync App also supports setting client certificates. Set the following to connect using a client certificate.
To connect through the Windows system proxy, you do not need to set any additional connection properties. To connect to other proxies, set ProxyAutoDetect to false.
In addition, to authenticate to an HTTP proxy, set ProxyAuthScheme, ProxyUser, and ProxyPassword, in addition to ProxyServer and ProxyPort.
Set the following properties:
This section shows the available API objects and provides more information on executing SQL to Parquet APIs.
The connection string properties are the various options that can be used to establish a connection. This section provides a complete list of the options you can configure in the connection string for this provider. Click the links for further details.
For more information on establishing a connection, see Establishing a Connection.
Property | Description |
AuthScheme | The type of authentication to use when connecting to remote services. |
AccessKey | Your account access key. This value is accessible from your security credentials page. |
SecretKey | Your account secret key. This value is accessible from your security credentials page. |
ApiKey | The API Key used to identify the user to IBM Cloud. |
User | The user account used to authenticate. |
Password | The password used to authenticate the user. |
SharePointEdition | The edition of SharePoint being used. Set either SharePointOnline or SharePointOnPremise. |
Property | Description |
ConnectionType | Specifies the file storage service, server, or file access protocol through which your Parquet files are stored and retreived. |
URI | The Uniform Resource Identifier (URI) for the Parquet resource location. |
DataModel | Specifies the data model to use when parsing Parquet documents and generating the database metadata. |
Region | The hosting region for your S3-like Web Services. |
ProjectId | The Id of the project where your Google Cloud Storage instance resides. |
OracleNamespace | The Oracle Cloud Object Storage namespace to use. |
StorageBaseURL | The URL of a cloud storage service provider. |
UseVirtualHosting | If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified. |
UseLakeFormation | When this property is set to true, AWSLakeFormation service will be used to retrieve temporary credentials, which enforce access policies against the user based on the configured IAM role. The service can be used when authenticating through OKTA, ADFS, AzureAD, PingFederate, while providing a SAML assertion. |
Property | Description |
AWSAccessKey | Your AWS account access key. This value is accessible from your AWS security credentials page. |
AWSSecretKey | Your AWS account secret key. This value is accessible from your AWS security credentials page. |
AWSRoleARN | The Amazon Resource Name of the role to use when authenticating. |
AWSPrincipalARN | The ARN of the SAML Identity provider in your AWS account. |
AWSRegion | The hosting region for your Amazon Web Services. |
AWSCredentialsFile | The path to the AWS Credentials File to be used for authentication. |
AWSCredentialsFileProfile | The name of the profile to be used from the supplied AWSCredentialsFile. |
AWSSessionToken | Your AWS session token. |
AWSExternalId | A unique identifier that might be required when you assume a role in another account. |
MFASerialNumber | The serial number of the MFA device if one is being used. |
MFAToken | The temporary token available from your MFA device. |
TemporaryTokenDuration | The amount of time (in seconds) a temporary token will last. |
ServerSideEncryption | When activated, file uploads into Amazon S3 buckets will be server-side encrypted. |
SSEContext | A BASE64-encoded UTF-8 string holding JSON which represents a string-string (key-value) map. |
SSEEnableS3BucketKeys | Configuration to use an S3 Bucket Key at the object level when encrypting data with AWS KMS. Enabling this will reduce the cost of server-side encryption by lowering calls to AWS KMS. |
SSEKey | A symmetric encryption KeyManagementService key, that is used to protect the data when using ServerSideEncryption. |
Property | Description |
AzureStorageAccount | The name of your Azure storage account. |
AzureAccessKey | The storage key associated with your Azure account. |
AzureSharedAccessSignature | A shared access key signature that may be used for authentication. |
AzureTenant | The Microsoft Online tenant being used to access data. If not specified, your default tenant is used. |
AzureEnvironment | The Azure Environment to use when establishing a connection. |
Property | Description |
SSOLoginURL | The identity provider's login URL. |
SSOProperties | Additional properties required to connect to the identity provider in a semicolon-separated list. |
SSOExchangeUrl | The URL used for consuming the SAML response and exchanging it for service specific credentials. |
Property | Description |
OAuthVersion | The version of OAuth being used. |
OAuthClientId | The client Id assigned when you register your application with an OAuth authorization server. |
OAuthClientSecret | The client secret assigned when you register your application with an OAuth authorization server. |
Scope | Specify scope to obtain the initial access and refresh token. |
OAuthGrantType | The grant type for the OAuth flow. |
OAuthPasswordGrantMode | How to pass Client Id and Secret with OAuthGrantType is set to Password. |
OAuthIncludeCallbackURL | Whether to include the callback URL in an access token request. |
OAuthAuthorizationURL | The authorization URL for the OAuth service. |
OAuthAccessTokenURL | The URL to retrieve the OAuth access token from. |
OAuthRefreshTokenURL | The URL to refresh the OAuth token from. |
OAuthRequestTokenURL | The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0. |
AuthToken | The authentication token used to request and obtain the OAuth Access Token. |
AuthKey | The authentication secret used to request and obtain the OAuth Access Token. |
OAuthParams | A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value. |
Property | Description |
OAuthJWTCert | The JWT Certificate store. |
OAuthJWTCertType | The type of key store containing the JWT Certificate. |
OAuthJWTCertPassword | The password for the OAuth JWT certificate. |
OAuthJWTCertSubject | The subject of the OAuth JWT certificate. |
Property | Description |
KerberosKDC | The Kerberos Key Distribution Center (KDC) service used to authenticate the user. |
KerberosRealm | The Kerberos Realm used to authenticate the user. |
KerberosSPN | The service principal name (SPN) for the Kerberos Domain Controller. |
KerberosKeytabFile | The Keytab file containing your pairs of Kerberos principals and encrypted keys. |
KerberosServiceRealm | The Kerberos realm of the service. |
KerberosServiceKDC | The Kerberos KDC of the service. |
KerberosTicketCache | The full file path to an MIT Kerberos credential cache file. |
Property | Description |
SSLClientCert | The TLS/SSL client certificate store for SSL Client Authentication (2-way SSL). |
SSLClientCertType | The type of key store containing the TLS/SSL client certificate. |
SSLClientCertPassword | The password for the TLS/SSL client certificate. |
SSLClientCertSubject | The subject of the TLS/SSL client certificate. |
SSLMode | The authentication mechanism to be used when connecting to the FTP or FTPS server. |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
Property | Description |
SSHAuthMode | The authentication method used when establishing an SSH Tunnel to the service. |
SSHClientCert | A certificate to be used for authenticating the SSHUser. |
SSHClientCertPassword | The password of the SSHClientCert key if it has one. |
SSHClientCertSubject | The subject of the SSH client certificate. |
SSHClientCertType | The type of SSHClientCert private key. |
SSHUser | The SSH user. |
SSHPassword | The SSH password. |
Property | Description |
FirewallType | The protocol used by a proxy-based firewall. |
FirewallServer | The name or IP address of a proxy-based firewall. |
FirewallPort | The TCP port for a proxy-based firewall. |
FirewallUser | The user name to use to authenticate with a proxy-based firewall. |
FirewallPassword | A password used to authenticate to a proxy-based firewall. |
Property | Description |
ProxyAutoDetect | This indicates whether to use the system proxy settings or not. |
ProxyServer | The hostname or IP address of a proxy to route HTTP traffic through. |
ProxyPort | The TCP port the ProxyServer proxy is running on. |
ProxyAuthScheme | The authentication type to use to authenticate to the ProxyServer proxy. |
ProxyUser | A user name to be used to authenticate to the ProxyServer proxy. |
ProxyPassword | A password to be used to authenticate to the ProxyServer proxy. |
ProxySSLType | The SSL type to use when connecting to the ProxyServer proxy. |
ProxyExceptions | A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer . |
Property | Description |
LogModules | Core modules to be included in the log file. |
Property | Description |
Location | A path to the directory that contains the schema files defining tables, views, and stored procedures. |
BrowsableSchemas | This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC. |
Tables | This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC. |
Views | Restricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC. |
FlattenObjects | Set FlattenObjects to true to flatten object properties into columns of their own. Otherwise, objects nested in arrays are returned as strings of JSON. |
FlattenArrays | By default, nested arrays are returned as strings. The FlattenArrays property can be used to flatten the elements of nested arrays into columns of their own. Set FlattenArrays to the number of elements you want to return from nested arrays. |
Property | Description |
AggregateFiles | When set to true, the provider will aggregate all the files in URI directory into a single result. With this option enabled, the AggregatedFiles will be exposed which can be used to query the dataset. |
Charset | Specifies the session character set for encoding and decoding character data transferred to and from the Parquet file. The default value is UTF-8. |
ClientCulture | This property can be used to specify the format of data (e.g., currency values) that is accepted by the client application. This property can be used when the client application does not support the machine's culture settings. For example, Microsoft Access requires 'en-US'. |
Compression | Specifies which compression encoding to be used when creating .parquet files using Create Table Statement and Bulk Inserts. |
Culture | This setting can be used to specify culture settings that determine how the provider interprets certain data types that are passed into the provider. For example, setting Culture='de-DE' will output German formats even on an American machine. |
DeleteDownloadedFiles | When set to true, the provider will delete parsed .parquet files downloaded from cloud sources. |
DirectoryRetrievalDepth | Limit the subfolders recursively scanned when IncludeSubdirectories is enabled. |
EnableDictionary | When set to true, the provider will enable dictionary encoding when creating .parquet files using Create Table Statement and Bulk Inserts. |
ExcludeFiles | Comma-separated list of file extensions to exclude from the set of the files modeled as tables. |
FolderId | The ID of a folder in Google Drive. If set, the resource location specified by the URI is relative to the Folder ID for all operations. |
IncludeDropboxTeamResources | Indicates if you want to include Dropbox team files and folders. |
IncludeFiles | Comma-separated list of file extensions to include into the set of the files modeled as tables. |
IncludeItemsFromAllDrives | Whether Google Drive shared drive items should be included in results. If not present or set to false, then shared drive items are not returned. |
IncludeSubdirectories | Whether to read files from nested folders. In the case of a name collision, table names are prefixed by the underscore-separated folder names. |
InsertMode | The behavior when using bulk inserts to create Parquet files. |
MaxRows | Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses. |
MetadataDiscoveryURI | Used when aggregating multiple files into one table, this property specifies a specific file to read to determined the aggregated table schema. |
Other | These hidden properties are used only in specific use cases. |
PageSize | (Optional) PageSize value. |
PathSeparator | Determines the character which will be used to replace the file separator. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
TemporaryLocalFolder | The path, or URI, to the folder that is used to temporarily download parquet file(s). |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
UserDefinedViews | A filepath pointing to the JSON configuration file containing your custom views. |
This section provides a complete list of the Authentication properties you can configure in the connection string for this provider.
Property | Description |
AuthScheme | The type of authentication to use when connecting to remote services. |
AccessKey | Your account access key. This value is accessible from your security credentials page. |
SecretKey | Your account secret key. This value is accessible from your security credentials page. |
ApiKey | The API Key used to identify the user to IBM Cloud. |
User | The user account used to authenticate. |
Password | The password used to authenticate the user. |
SharePointEdition | The edition of SharePoint being used. Set either SharePointOnline or SharePointOnPremise. |
The type of authentication to use when connecting to remote services.
The following options are available when ConnectionType is set to Amazon S3:
The following options are available when ConnectionType is set to Azure Blob Storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure Data Lake Storage Gen2 SSL, or OneDrive:
The following options are available when ConnectionType is set to Box:
Only the following option is available when ConnectionType is set to Dropbox:
OAuth: Uses either OAuth1 or OAuth2, with the specific flow being determined by the OAuthGrantType. OAuthVersion must be set to determine what version of OAuth is used.
Only the following option is available when ConnectionType is set to FTP or FTPS:
Basic: Basic user credentials (user/password).
The following options are available when ConnectionType points Google Cloud Storage or Google Drive:
The following options are available when ConnectionType is set to HDFS or HDFS Secure:
The following options are available when ConnectionType is set to HTTP or HTTPS:
The following options are also available when ConnectionType is set to IBM Object Storage Source:
Only the following option is available when ConnectionType is set to Oracle Cloud Storage:
HMAC: Uses AccessKey and SecretKey to authenticate to the Oracle Cloud Storage.
This ConnectionType defaults to using an AuthScheme called SFTP, but the authentication method is actually controlled using the SSHAuthMode property. See this property's documentation for further information.
The following options are also available when ConnectionType is set to SharePoint REST:
The following options are also available when ConnectionType is set to SharePoint SOAP:
Your account access key. This value is accessible from your security credentials page.
Your account access key. This value is accessible from your security credentials page depending on the service you are using.
Your account secret key. This value is accessible from your security credentials page.
Your account secret key. This value is accessible from your security credentials page depending on the service you are using.
The API Key used to identify the user to IBM Cloud.
Access to resources in the Parquet REST API is governed by an API key in order to retrieve token. An API Key can be created by navigating to Manage --> Access (IAM) --> Users and clicking 'Create'.
The user account used to authenticate.
Together with Password, this field is used to authenticate against the server.
This property will refer to different things based on the context, namely the value of ConnectionType and AuthScheme:
The password used to authenticate the user.
The User and Password are together used to authenticate with the server.
This property will refer to different things based on the context, namely the value of ConnectionType and AuthScheme:
This section provides a complete list of the Connection properties you can configure in the connection string for this provider.
Property | Description |
ConnectionType | Specifies the file storage service, server, or file access protocol through which your Parquet files are stored and retreived. |
URI | The Uniform Resource Identifier (URI) for the Parquet resource location. |
DataModel | Specifies the data model to use when parsing Parquet documents and generating the database metadata. |
Region | The hosting region for your S3-like Web Services. |
ProjectId | The Id of the project where your Google Cloud Storage instance resides. |
OracleNamespace | The Oracle Cloud Object Storage namespace to use. |
StorageBaseURL | The URL of a cloud storage service provider. |
UseVirtualHosting | If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified. |
UseLakeFormation | When this property is set to true, AWSLakeFormation service will be used to retrieve temporary credentials, which enforce access policies against the user based on the configured IAM role. The service can be used when authenticating through OKTA, ADFS, AzureAD, PingFederate, while providing a SAML assertion. |
Specifies the file storage service, server, or file access protocol through which your Parquet files are stored and retreived.
Set the ConnectionType to one of the following:
The Uniform Resource Identifier (URI) for the Parquet resource location.
Set the URI property to specify a path to a file or stream.
NOTE:
See for more advanced features available for parsing and merging multiple files.
Below are examples of the URI formats for the available data sources:
Service provider | URI formats | |
Local | Single File Path One table
localPath/file.parquet file://localPath/file.parquet Directory Path (one table per file) localPath file://localPath | |
HTTP or HTTPS | http://remoteStream
https://remoteStream | |
Amazon S3 | Single File Path One table
s3://remotePath/file.parquet Directory Path (one table per file) s3://remotePath | |
Azure Blob Storage | Single File Path One table
azureblob://mycontainer/myblob//file.parquet Directory Path (one table per file) azureblob://mycontainer/myblob/ | |
OneDrive | Single File Path One table
onedrive://remotePath/file.parquet Directory Path (one table per file) onedrive://remotePath | |
Google Cloud Storage | Single File Path One table
gs://bucket/remotePath/file.parquet Directory Path (one table per file) gs://bucket/remotePath | |
Google Drive | Single File Path One table
gdrive://remotePath/file.parquet Directory Path (one table per file) gdrive://remotePath | |
Box | Single File Path One table
box://remotePath/file.parquet Directory Path (one table per file) box://remotePath | |
FTP or FTPS | Single File Path One table
ftp://server:port/remotePath/file.parquet Directory Path (one table per file) ftp://server:port/remotePath | |
SFTP | Single File Path One table
sftp://server:port/remotePath/file.parquet Directory Path (one table per file) sftp://server:port/remotePath | |
Sharepoint | Single File Path One table
sp://https://server/remotePath/file.parquet Directory Path (one table per file) sp://https://server/remotePath |
Below are example connection strings to Parquet files or streams.
Service provider | URI formats | Connection example |
Local | Single File Path One table
localPath file://localPath/file.parquet Directory Path (one table per file) localPath file://localPath | URI=C:\folder1/file.parquet |
Amazon S3 | Single File Path One table
s3://bucket1/folder1/file.parquet Directory Path (one table per file) s3://bucket1/folder1 | URI=s3://bucket1/folder1/file.parquet; AWSAccessKey=token1; AWSSecretKey=secret1; AWSRegion=OHIO; |
Azure Blob Storage | Single File Path One table
azureblob://mycontainer/myblob//file.parquet Directory Path (one table per file) azureblob://mycontainer/myblob/ | URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount; AzureAccessKey=myKey;
URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount; AuthScheme=OAuth; |
OneDrive | Single File Path One table
onedrive://remotePath/file.parquet Directory Path (one table per file) onedrive://remotePath | URI=onedrive://folder1/file.parquet; AuthScheme=OAuth;
URI=onedrive://SharedWithMe/folder1/file.parquet; AuthScheme=OAuth; |
Google Cloud Storage | Single File Path One table
gs://bucket/remotePath/file.parquet Directory Path (one table per file) gs://bucket/remotePath | URI=gs://bucket/folder1/file.parquet; AuthScheme=OAuth; ProjectId=test; |
Google Drive | Single File Path One table
gdrive://remotePath/file.parquet Directory Path (one table per file) gdrive://remotePath | URI=gdrive://folder1/file.parquet; |
Box | Single File Path One table
box://remotePath/file.parquet Directory Path (one table per file) box://remotePath | URI=box://folder1/file.parquet; OAuthClientId=oauthclientid1; OAuthClientSecret=oauthcliensecret1; CallbackUrl=http://localhost:12345; |
FTP or FTPS | Single File Path One table
ftp://server:port/remotePath/file.parquet Directory Path (one table per file) ftp://server:port/remotePath | URI=ftps://localhost:990/folder1/file.parquet; User=user1; Password=password1; |
SFTP | sftp://server:port/remotePath | URI=sftp://127.0.0.1:22/remotePath/file.parquet; User=user1; Password=password1; |
Sharepoint | sp://https://server/remotePath | URI=sp://https://domain.sharepoint.com/Documents/file.parquet; User=user1; Password=password1; |
Specifies the data model to use when parsing Parquet documents and generating the database metadata.
The Sync App splits documents into rows based on the objects nested in arrays. Select a DataModel configuration to configure how the Sync App models nested object arrays into tables.
The following DataModel configurations are available.
Document
Returns a single table representing a row for each top-level object. In this data model, any nested object arrays will not be flattened and will be returned as aggregates.
FlattenedDocuments
Returns a single table representing a SQL CROSS JOIN of the available documents in the file.
Relational
Returns multiple tables, one for each nested object array. In this data model, any nested documents (object arrays) will be returned as relational tables that contain a primary key and a foreign key that links to the parent table.
The hosting region for your S3-like Web Services.
The hosting region for your S3-like Web Services.
Value | Region |
Commercial Cloud Regions | |
ap-hyderabad-1 | India South (Hyderabad) |
ap-melbourne-1 | Australia Southeast (Melbourne) |
ap-mumbai-1 | India West (Mumbai) |
ap-osaka-1 | Japan Central (Osaka) |
ap-seoul-1 | South Korea Central (Seoul) |
ap-sydney-1 | Australia East (Sydney) |
ap-tokyo-1 | Japan East (Tokyo) |
ca-montreal-1 | Canada Southeast (Montreal) |
ca-toronto-1 | Canada Southeast (Toronto) |
eu-amsterdam-1 | Netherlands Northwest (Amsterdam) |
eu-frankfurt-1 | Germany Central (Frankfurt) |
eu-zurich-1 | Switzerland North (Zurich) |
me-jeddah-1 | Saudi Arabia West (Jeddah) |
sa-saopaulo-1 | Brazil East (Sao Paulo) |
uk-london-1 | UK South (London) |
us-ashburn-1 (default) | US East (Ashburn, VA) |
us-phoenix-1 | US West (Phoenix, AZ) |
US Gov FedRAMP High Regions | |
us-langley-1 | US Gov East (Ashburn, VA) |
us-luke-1 | US Gov West (Phoenix, AZ) |
US Gov DISA IL5 Regions | |
us-gov-ashburn-1 | US DoD East (Ashburn, VA) |
us-gov-chicago-1 | US DoD North (Chicago, IL) |
us-gov-phoenix-1 | US DoD West (Phoenix, AZ) |
Value | Region |
eu-central-1 | Europe (Amsterdam) |
us-east-1 (Default) | US East (Ashburn, VA) |
us-east-2 | US East (Manassas, VA) |
us-west-1 | US West (Hillsboro, OR) |
The Id of the project where your Google Cloud Storage instance resides.
The Id of the project where your Google Cloud Storage instance resides. You can find this value by going to Google Cloud Console and clicking the project name at the top left screen. The ProjectId is displayed on the Id column of the matching project.
The Oracle Cloud Object Storage namespace to use.
The Oracle Cloud Object Storage namespace to use. This setting must be set to the Oracle Cloud Object Storage namespace associated with the Oracle Cloud account before any requests can be made. Refer to the Understanding Object Storage Namespaces page of the Oracle Cloud documentation for instructions on how to find your account's Object Storage namespace.
The URL of a cloud storage service provider.
This connection property is used to specify:
If the domain for this option ends in -my (for example, https://bigcorp-my.sharepoint.com) then you may need to use the onedrive:// scheme instead of the sp:// or sprest:// scheme.
If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified.
If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified.
When this property is set to true, AWSLakeFormation service will be used to retrieve temporary credentials, which enforce access policies against the user based on the configured IAM role. The service can be used when authenticating through OKTA, ADFS, AzureAD, PingFederate, while providing a SAML assertion.
When this property is set to true, AWSLakeFormation service will be used to retrieve temporary credentials, which enforce access policies against the user based on the configured IAM role. The service can be used when authenticating through OKTA, ADFS, AzureAD, PingFederate, while providing a SAML assertion.
This section provides a complete list of the AWS Authentication properties you can configure in the connection string for this provider.
Property | Description |
AWSAccessKey | Your AWS account access key. This value is accessible from your AWS security credentials page. |
AWSSecretKey | Your AWS account secret key. This value is accessible from your AWS security credentials page. |
AWSRoleARN | The Amazon Resource Name of the role to use when authenticating. |
AWSPrincipalARN | The ARN of the SAML Identity provider in your AWS account. |
AWSRegion | The hosting region for your Amazon Web Services. |
AWSCredentialsFile | The path to the AWS Credentials File to be used for authentication. |
AWSCredentialsFileProfile | The name of the profile to be used from the supplied AWSCredentialsFile. |
AWSSessionToken | Your AWS session token. |
AWSExternalId | A unique identifier that might be required when you assume a role in another account. |
MFASerialNumber | The serial number of the MFA device if one is being used. |
MFAToken | The temporary token available from your MFA device. |
TemporaryTokenDuration | The amount of time (in seconds) a temporary token will last. |
ServerSideEncryption | When activated, file uploads into Amazon S3 buckets will be server-side encrypted. |
SSEContext | A BASE64-encoded UTF-8 string holding JSON which represents a string-string (key-value) map. |
SSEEnableS3BucketKeys | Configuration to use an S3 Bucket Key at the object level when encrypting data with AWS KMS. Enabling this will reduce the cost of server-side encryption by lowering calls to AWS KMS. |
SSEKey | A symmetric encryption KeyManagementService key, that is used to protect the data when using ServerSideEncryption. |
Your AWS account access key. This value is accessible from your AWS security credentials page.
Your AWS account access key. This value is accessible from your AWS security credentials page:
Your AWS account secret key. This value is accessible from your AWS security credentials page.
Your AWS account secret key. This value is accessible from your AWS security credentials page:
The Amazon Resource Name of the role to use when authenticating.
When authenticating outside of AWS, it is common to use a Role for authentication instead of your direct AWS account credentials. Entering the AWSRoleARN will cause the CData Sync App to perform a role based authentication instead of using the AWSAccessKey and AWSSecretKey directly. The AWSAccessKey and AWSSecretKey must still be specified to perform this authentication. You cannot use the credentials of an AWS root user when setting RoleARN. The AWSAccessKey and AWSSecretKey must be those of an IAM user.
The ARN of the SAML Identity provider in your AWS account.
The ARN of the SAML Identity provider in your AWS account.
The hosting region for your Amazon Web Services.
The hosting region for your Amazon Web Services. Available values are OHIO, NORTHERNVIRGINIA, NORTHERNCALIFORNIA, OREGON, CAPETOWN, HONGKONG, JAKARTA, MUMBAI, OSAKA, SEOUL, SINGAPORE, SYDNEY, TOKYO, CENTRAL, BEIJING, NINGXIA, FRANKFURT, IRELAND, LONDON, MILAN, PARIS, STOCKHOLM, ZURICH, BAHRAIN, UAE, SAOPAULO, GOVCLOUDEAST, and GOVCLOUDWEST.
The path to the AWS Credentials File to be used for authentication.
The path to the AWS Credentials File to be used for authentication. See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html for more information.
The name of the profile to be used from the supplied AWSCredentialsFile.
The name of the profile to be used from the supplied AWSCredentialsFile. See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html for more information.
Your AWS session token.
Your AWS session token. This value can be retrieved in different ways. See this link for more info.
A unique identifier that might be required when you assume a role in another account.
A unique identifier that might be required when you assume a role in another account.
The serial number of the MFA device if one is being used.
You can find the device for an IAM user by going to the AWS Management Console and viewing the user's security credentials. For virtual devices, this is actually an Amazon Resource Name (such as arn:aws:iam::123456789012:mfa/user).
The temporary token available from your MFA device.
If MFA is required, this value will be used along with the MFASerialNumber to retrieve temporary credentials to login. The temporary credentials available from AWS will only last up to 1 hour by default (see TemporaryTokenDuration). Once the time is up, the connection must be updated to specify a new MFA token so that new credentials may be obtained. %AWSpSecurityToken; %AWSpTemporaryTokenDuration;
The amount of time (in seconds) a temporary token will last.
Temporary tokens are used with both MFA and Role based authentication. Temporary tokens will eventually time out, at which time a new temporary token must be obtained. For situations where MFA is not used, this is not a big deal. The CData Sync App will internally request a new temporary token once the temporary token has expired.
However, for MFA required connection, a new MFAToken must be specified in the connection to retrieve a new temporary token. This is a more intrusive issue since it requires an update to the connection by the user. The maximum and minimum that can be specified will depend largely on the connection being used.
For Role based authentication, the minimum duration is 900 seconds (15 minutes) while the maximum if 3600 (1 hour). Even if MFA is used with role based authentication, 3600 is still the maximum.
For MFA authentication by itself (using an IAM User or root user), the minimum is 900 seconds (15 minutes), the maximum is 129600 (36 hours).
When activated, file uploads into Amazon S3 buckets will be server-side encrypted.
Server-side encryption is the encryption of data at its destination by the application or service that receives it. Amazon S3 encrypts your data at the object level as it writes it to disks in its data centers and decrypts it for you when you access it. Learn more: https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html
A BASE64-encoded UTF-8 string holding JSON which represents a string-string (key-value) map.
Example of what the JSON may look decoded: {"aws:s3:arn": "arn:aws:s3:::_bucket_/_object_"}.
Configuration to use an S3 Bucket Key at the object level when encrypting data with AWS KMS. Enabling this will reduce the cost of server-side encryption by lowering calls to AWS KMS.
Configuration to use an S3 Bucket Key at the object level when encrypting data with AWS KMS. Enabling this will reduce the cost of server-side encryption by lowering calls to AWS KMS.
A symmetric encryption KeyManagementService key, that is used to protect the data when using ServerSideEncryption.
A symmetric encryption KeyManagementService key, that is used to protect the data when using ServerSideEncryption.
This section provides a complete list of the Azure Authentication properties you can configure in the connection string for this provider.
Property | Description |
AzureStorageAccount | The name of your Azure storage account. |
AzureAccessKey | The storage key associated with your Azure account. |
AzureSharedAccessSignature | A shared access key signature that may be used for authentication. |
AzureTenant | The Microsoft Online tenant being used to access data. If not specified, your default tenant is used. |
AzureEnvironment | The Azure Environment to use when establishing a connection. |
The name of your Azure storage account.
The name of your Azure storage account.
The storage key associated with your Azure account.
The storage key associated with your Parquet account. You can retrieve it as follows:
The Microsoft Online tenant being used to access data. If not specified, your default tenant is used.
The Microsoft Online tenant being used to access data. For instance, contoso.onmicrosoft.com. Alternatively, specify the tenant Id. This value is the directory Id in the Azure Portal > Azure Active Directory > Properties.
Typically it is not necessary to specify the Tenant. This can be automatically determined by Microsoft when using the OAuthGrantType set to CODE (default). However, it may fail in the case that the user belongs to multiple tenants. For instance, if an Admin of domain A invites a user of domain B to be a guest user. The user will now belong to both tenants. It is a good practice to specify the Tenant, although in general things should normally work without having to specify it.
The AzureTenant is required when setting OAuthGrantType to CLIENT. When using client credentials, there is no user context. The credentials are taken from the context of the app itself. While Microsoft still allows client credentials to be obtained without specifying which Tenant, it has a much lower probability of picking the specific tenant you want to work with. For this reason, we require AzureTenant to be explicitly stated for all client credentials connections to ensure you get credentials that are applicable for the domain you intend to connect to.
The Azure Environment to use when establishing a connection.
In most cases, leaving the environment set to global will work. However, if your Azure Account has been added to a different environment, the AzureEnvironment may be used to specify which environment. The available values are GLOBAL, CHINA, USGOVT, USGOVTDOD.
This section provides a complete list of the SSO properties you can configure in the connection string for this provider.
Property | Description |
SSOLoginURL | The identity provider's login URL. |
SSOProperties | Additional properties required to connect to the identity provider in a semicolon-separated list. |
SSOExchangeUrl | The URL used for consuming the SAML response and exchanging it for service specific credentials. |
The identity provider's login URL.
The identity provider's login URL.
Additional properties required to connect to the identity provider in a semicolon-separated list.
Additional properties required to connect to the identity provider in a semicolon-separated list. is used in conjunction with the SSOLoginURL.
SSO configuration is discussed further in .
The URL used for consuming the SAML response and exchanging it for service specific credentials.
The CData Sync App will use the URL specified here to consume a SAML response and exchange it for service specific credentials. The retrieved credentials are the final piece during the SSO connection that are used to communicate with Parquet.
This section provides a complete list of the OAuth properties you can configure in the connection string for this provider.
Property | Description |
OAuthVersion | The version of OAuth being used. |
OAuthClientId | The client Id assigned when you register your application with an OAuth authorization server. |
OAuthClientSecret | The client secret assigned when you register your application with an OAuth authorization server. |
Scope | Specify scope to obtain the initial access and refresh token. |
OAuthGrantType | The grant type for the OAuth flow. |
OAuthPasswordGrantMode | How to pass Client Id and Secret with OAuthGrantType is set to Password. |
OAuthIncludeCallbackURL | Whether to include the callback URL in an access token request. |
OAuthAuthorizationURL | The authorization URL for the OAuth service. |
OAuthAccessTokenURL | The URL to retrieve the OAuth access token from. |
OAuthRefreshTokenURL | The URL to refresh the OAuth token from. |
OAuthRequestTokenURL | The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0. |
AuthToken | The authentication token used to request and obtain the OAuth Access Token. |
AuthKey | The authentication secret used to request and obtain the OAuth Access Token. |
OAuthParams | A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value. |
The version of OAuth being used.
The version of OAuth being used. The following options are available: 1.0,2.0
The client Id assigned when you register your application with an OAuth authorization server.
As part of registering an OAuth application, you will receive the OAuthClientId value, sometimes also called a consumer key, and a client secret, the OAuthClientSecret.
The client secret assigned when you register your application with an OAuth authorization server.
As part of registering an OAuth application, you will receive the OAuthClientId, also called a consumer key. You will also receive a client secret, also called a consumer secret. Set the client secret in the OAuthClientSecret property.
Specify scope to obtain the initial access and refresh token.
Specify scope to obtain the initial access and refresh token.
The grant type for the OAuth flow.
The following options are available: CODE,CLIENT,PASSWORD
How to pass Client Id and Secret with OAuthGrantType is set to Password.
The OAuth RFC specifies two methods of passing the OAuthClientId and OAuthClientSecret when using the Password OAuthGrantType. The most commonly used is to pass them via post data to the service. However, some services may require that you pass them via the Authorize header as to be used in BASIC authorization. Change this property to Basic to submit the parameters as part of the Authorize header instead of the post data.
Whether to include the callback URL in an access token request.
This defaults to true since standards-compliant OAuth services will ignore the redirect_uri parameter for grant types like CLIENT or PASSWORD that do not require it.
This option should only be enabled for OAuth services that report errors when redirect_uri is included.
The authorization URL for the OAuth service.
The authorization URL for the OAuth service. At this URL, the user logs into the server and grants permissions to the application. In OAuth 1.0, if permissions are granted, the request token is authorized.
The URL to retrieve the OAuth access token from.
The URL to retrieve the OAuth access token from. In OAuth 1.0, the authorized request token is exchanged for the access token at this URL.
The URL to refresh the OAuth token from.
The URL to refresh the OAuth token from. In OAuth 2.0, this URL is where the refresh token is exchanged for a new access token when the old access token expires.
The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0.
The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0. In OAuth 1.0, this is the URL where the app makes a request for the request token.
The authentication token used to request and obtain the OAuth Access Token.
This property is required only when performing headless authentication in OAuth 1.0. It can be obtained from the GetOAuthAuthorizationUrl stored procedure.
It can be supplied alongside the AuthKey in the GetOAuthAccessToken stored procedure to obtain the OAuthAccessToken.
The authentication secret used to request and obtain the OAuth Access Token.
This property is required only when performing headless authentication in OAuth 1.0. It can be obtained from the GetOAuthAuthorizationUrl stored procedure.
It can be supplied alongside the AuthToken in the GetOAuthAccessToken stored procedure to obtain the OAuthAccessToken.
A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value.
A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value.
This section provides a complete list of the JWT OAuth properties you can configure in the connection string for this provider.
Property | Description |
OAuthJWTCert | The JWT Certificate store. |
OAuthJWTCertType | The type of key store containing the JWT Certificate. |
OAuthJWTCertPassword | The password for the OAuth JWT certificate. |
OAuthJWTCertSubject | The subject of the OAuth JWT certificate. |
The JWT Certificate store.
The name of the certificate store for the client certificate.
The OAuthJWTCertType field specifies the type of the certificate store specified by OAuthJWTCert. If the store is password protected, specify the password in OAuthJWTCertPassword.
OAuthJWTCert is used in conjunction with the OAuthJWTCertSubject field in order to specify client certificates. If OAuthJWTCert has a value, and OAuthJWTCertSubject is set, a search for a certificate is initiated. Please refer to the OAuthJWTCertSubject field for details.
Designations of certificate stores are platform-dependent.
The following are designations of the most common User and Machine certificate stores in Windows:
MY | A certificate store holding personal certificates with their associated private keys. |
CA | Certifying authority certificates. |
ROOT | Root certificates. |
SPC | Software publisher certificates. |
In Java, the certificate store normally is a file containing certificates and optional private keys.
When the certificate store type is PFXFile, this property must be set to the name of the file. When the type is PFXBlob, the property must be set to the binary contents of a PFX file (i.e. PKCS12 certificate store).
The type of key store containing the JWT Certificate.
This property can take one of the following values:
USER | For Windows, this specifies that the certificate store is a certificate store owned by the current user. Note: This store type is not available in Java. |
MACHINE | For Windows, this specifies that the certificate store is a machine store. Note: this store type is not available in Java. |
PFXFILE | The certificate store is the name of a PFX (PKCS12) file containing certificates. |
PFXBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in PFX (PKCS12) format. |
JKSFILE | The certificate store is the name of a Java key store (JKS) file containing certificates. Note: this store type is only available in Java. |
JKSBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in Java key store (JKS) format. Note: this store type is only available in Java. |
PEMKEY_FILE | The certificate store is the name of a PEM-encoded file that contains a private key and an optional certificate. |
PEMKEY_BLOB | The certificate store is a string (base64-encoded) that contains a private key and an optional certificate. |
PUBLIC_KEY_FILE | The certificate store is the name of a file that contains a PEM- or DER-encoded public key certificate. |
PUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains a PEM- or DER-encoded public key certificate. |
SSHPUBLIC_KEY_FILE | The certificate store is the name of a file that contains an SSH-style public key. |
SSHPUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains an SSH-style public key. |
P7BFILE | The certificate store is the name of a PKCS7 file containing certificates. |
PPKFILE | The certificate store is the name of a file that contains a PPK (PuTTY Private Key). |
XMLFILE | The certificate store is the name of a file that contains a certificate in XML format. |
XMLBLOB | The certificate store is a string that contains a certificate in XML format. |
GOOGLEJSON | The certificate store is the name of a JSON file containing the service account information. Only valid when connecting to a Google service. |
GOOGLEJSONBLOB | The certificate store is a string that contains the service account JSON. Only valid when connecting to a Google service. |
The password for the OAuth JWT certificate.
If the certificate store is of a type that requires a password, this property is used to specify that password in order to open the certificate store.
This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys are not encrypted.
The subject of the OAuth JWT certificate.
When loading a certificate the subject is used to locate the certificate in the store.
If an exact match is not found, the store is searched for subjects containing the value of the property.
If a match is still not found, the property is set to an empty string, and no certificate is selected.
The special value "*" picks the first certificate in the certificate store.
The certificate subject is a comma separated list of distinguished name fields and values. For instance "CN=www.server.com, OU=test, C=US, [email protected]". Common fields and their meanings are displayed below.
Field | Meaning |
CN | Common Name. This is commonly a host name like www.server.com. |
O | Organization |
OU | Organizational Unit |
L | Locality |
S | State |
C | Country |
E | Email Address |
If a field value contains a comma it must be quoted.
This section provides a complete list of the Kerberos properties you can configure in the connection string for this provider.
Property | Description |
KerberosKDC | The Kerberos Key Distribution Center (KDC) service used to authenticate the user. |
KerberosRealm | The Kerberos Realm used to authenticate the user. |
KerberosSPN | The service principal name (SPN) for the Kerberos Domain Controller. |
KerberosKeytabFile | The Keytab file containing your pairs of Kerberos principals and encrypted keys. |
KerberosServiceRealm | The Kerberos realm of the service. |
KerberosServiceKDC | The Kerberos KDC of the service. |
KerberosTicketCache | The full file path to an MIT Kerberos credential cache file. |
The Kerberos Key Distribution Center (KDC) service used to authenticate the user.
The Kerberos properties are used when using SPNEGO or Windows Authentication. The Sync App will request session tickets and temporary session keys from the Kerberos KDC service. The Kerberos KDC service is conventionally colocated with the domain controller.
If Kerberos KDC is not specified, the Sync App will attempt to detect these properties automatically from the following locations:
The Kerberos Realm used to authenticate the user.
The Kerberos properties are used when using SPNEGO or Windows Authentication. The Kerberos Realm is used to authenticate the user with the Kerberos Key Distribution Service (KDC). The Kerberos Realm can be configured by an administrator to be any string, but conventionally it is based on the domain name.
If Kerberos Realm is not specified, the Sync App will attempt to detect these properties automatically from the following locations:
The service principal name (SPN) for the Kerberos Domain Controller.
If the SPN on the Kerberos Domain Controller is not the same as the URL that you are authenticating to, use this property to set the SPN.
The Keytab file containing your pairs of Kerberos principals and encrypted keys.
The Keytab file containing your pairs of Kerberos principals and encrypted keys.
The Kerberos realm of the service.
The KerberosServiceRealm is the specify the service Kerberos realm when using cross-realm Kerberos authentication.
In most cases, a single realm and KDC machine are used to perform the Kerberos authentication and this property is not required.
This property is available for complex setups where a different realm and KDC machine are used to obtain an authentication ticket (AS request) and a service ticket (TGS request).
The Kerberos KDC of the service.
The KerberosServiceKDC is used to specify the service Kerberos KDC when using cross-realm Kerberos authentication.
In most cases, a single realm and KDC machine are used to perform the Kerberos authentication and this property is not required.
This property is available for complex setups where a different realm and KDC machine are used to obtain an authentication ticket (AS request) and a service ticket (TGS request).
The full file path to an MIT Kerberos credential cache file.
This property can be set if you wish to use a credential cache file that was created using the MIT Kerberos Ticket Manager or kinit command.
This section provides a complete list of the SSL properties you can configure in the connection string for this provider.
Property | Description |
SSLClientCert | The TLS/SSL client certificate store for SSL Client Authentication (2-way SSL). |
SSLClientCertType | The type of key store containing the TLS/SSL client certificate. |
SSLClientCertPassword | The password for the TLS/SSL client certificate. |
SSLClientCertSubject | The subject of the TLS/SSL client certificate. |
SSLMode | The authentication mechanism to be used when connecting to the FTP or FTPS server. |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
The TLS/SSL client certificate store for SSL Client Authentication (2-way SSL).
The name of the certificate store for the client certificate.
The SSLClientCertType field specifies the type of the certificate store specified by SSLClientCert. If the store is password protected, specify the password in SSLClientCertPassword.
SSLClientCert is used in conjunction with the SSLClientCertSubject field in order to specify client certificates. If SSLClientCert has a value, and SSLClientCertSubject is set, a search for a certificate is initiated. See SSLClientCertSubject for more information.
Designations of certificate stores are platform-dependent.
The following are designations of the most common User and Machine certificate stores in Windows:
MY | A certificate store holding personal certificates with their associated private keys. |
CA | Certifying authority certificates. |
ROOT | Root certificates. |
SPC | Software publisher certificates. |
In Java, the certificate store normally is a file containing certificates and optional private keys.
When the certificate store type is PFXFile, this property must be set to the name of the file. When the type is PFXBlob, the property must be set to the binary contents of a PFX file (for example, PKCS12 certificate store).
The type of key store containing the TLS/SSL client certificate.
This property can take one of the following values:
USER - default | For Windows, this specifies that the certificate store is a certificate store owned by the current user. Note that this store type is not available in Java. |
MACHINE | For Windows, this specifies that the certificate store is a machine store. Note that this store type is not available in Java. |
PFXFILE | The certificate store is the name of a PFX (PKCS12) file containing certificates. |
PFXBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in PFX (PKCS12) format. |
JKSFILE | The certificate store is the name of a Java key store (JKS) file containing certificates. Note that this store type is only available in Java. |
JKSBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in JKS format. Note that this store type is only available in Java. |
PEMKEY_FILE | The certificate store is the name of a PEM-encoded file that contains a private key and an optional certificate. |
PEMKEY_BLOB | The certificate store is a string (base64-encoded) that contains a private key and an optional certificate. |
PUBLIC_KEY_FILE | The certificate store is the name of a file that contains a PEM- or DER-encoded public key certificate. |
PUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains a PEM- or DER-encoded public key certificate. |
SSHPUBLIC_KEY_FILE | The certificate store is the name of a file that contains an SSH-style public key. |
SSHPUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains an SSH-style public key. |
P7BFILE | The certificate store is the name of a PKCS7 file containing certificates. |
PPKFILE | The certificate store is the name of a file that contains a PuTTY Private Key (PPK). |
XMLFILE | The certificate store is the name of a file that contains a certificate in XML format. |
XMLBLOB | The certificate store is a string that contains a certificate in XML format. |
The password for the TLS/SSL client certificate.
If the certificate store is of a type that requires a password, this property is used to specify that password to open the certificate store.
The subject of the TLS/SSL client certificate.
When loading a certificate the subject is used to locate the certificate in the store.
If an exact match is not found, the store is searched for subjects containing the value of the property. If a match is still not found, the property is set to an empty string, and no certificate is selected.
The special value "*" picks the first certificate in the certificate store.
The certificate subject is a comma separated list of distinguished name fields and values. For example, "CN=www.server.com, OU=test, C=US, [email protected]". The common fields and their meanings are shown below.
Field | Meaning |
CN | Common Name. This is commonly a host name like www.server.com. |
O | Organization |
OU | Organizational Unit |
L | Locality |
S | State |
C | Country |
E | Email Address |
If a field value contains a comma, it must be quoted.
The authentication mechanism to be used when connecting to the FTP or FTPS server.
If SSLMode is set to NONE, default plaintext authentication is used to log in to the server. If SSLMode is set to IMPLICIT, the SSL negotiation will start immediately after the connection is established. If SSLMode is set to EXPLICIT, the Sync App will first connect in plaintext, and then explicitly start SSL negotiation through a protocol command such as STARTTLS. If SSLMode is set to AUTOMATIC, if the remote port is set to the standard plaintext port of the protocol (where applicable), the component will behave the same as if SSLMode is set to EXPLICIT. In all other cases, SSL negotiation will be IMPLICIT.
The certificate to be accepted from the server when connecting using TLS/SSL.
If using a TLS/SSL connection, this property can be used to specify the TLS/SSL certificate to be accepted from the server. Any other certificate that is not trusted by the machine is rejected.
This property can take the following forms:
Description | Example |
A full PEM Certificate (example shortened for brevity) | -----BEGIN CERTIFICATE----- MIIChTCCAe4CAQAwDQYJKoZIhv......Qw== -----END CERTIFICATE----- |
A path to a local file containing the certificate | C:\cert.cer |
The public key (example shortened for brevity) | -----BEGIN RSA PUBLIC KEY----- MIGfMA0GCSq......AQAB -----END RSA PUBLIC KEY----- |
The MD5 Thumbprint (hex values can also be either space or colon separated) | ecadbdda5a1529c58a1e9e09828d70e4 |
The SHA1 Thumbprint (hex values can also be either space or colon separated) | 34a929226ae0819f2ec14b4a3d904f801cbb150d |
If not specified, any certificate trusted by the machine is accepted.
Use '*' to signify to accept all certificates. Note that this is not recommended due to security concerns.
This section provides a complete list of the SSH properties you can configure in the connection string for this provider.
Property | Description |
SSHAuthMode | The authentication method used when establishing an SSH Tunnel to the service. |
SSHClientCert | A certificate to be used for authenticating the SSHUser. |
SSHClientCertPassword | The password of the SSHClientCert key if it has one. |
SSHClientCertSubject | The subject of the SSH client certificate. |
SSHClientCertType | The type of SSHClientCert private key. |
SSHUser | The SSH user. |
SSHPassword | The SSH password. |
The authentication method used when establishing an SSH Tunnel to the service.
A certificate to be used for authenticating the SSHUser.
SSHClientCert must contain a valid private key in order to use public key authentication. A public key is optional, if one is not included then the Sync App generates it from the private key. The Sync App sends the public key to the server and the connection is allowed if the user has authorized the public key.
The SSHClientCertType field specifies the type of the key store specified by SSHClientCert. If the store is password protected, specify the password in SSHClientCertPassword.
Some types of key stores are containers which may include multiple keys. By default the Sync App will select the first key in the store, but you can specify a specific key using SSHClientCertSubject.
The password of the SSHClientCert key if it has one.
This property is only used when authenticating to SFTP servers with SSHAuthMode set to PublicKey and SSHClientCert set to a private key.
The subject of the SSH client certificate.
When loading a certificate the subject is used to locate the certificate in the store.
If an exact match is not found, the store is searched for subjects containing the value of the property.
If a match is still not found, the property is set to an empty string, and no certificate is selected.
The special value "*" picks the first certificate in the certificate store.
The certificate subject is a comma separated list of distinguished name fields and values. For instance "CN=www.server.com, OU=test, C=US, [email protected]". Common fields and their meanings are displayed below.
Field | Meaning |
CN | Common Name. This is commonly a host name like www.server.com. |
O | Organization |
OU | Organizational Unit |
L | Locality |
S | State |
C | Country |
E | Email Address |
If a field value contains a comma it must be quoted.
The type of SSHClientCert private key.
This property can take one of the following values:
Types | Description | Allowed Blob Values |
MACHINE/USER | Blob values are not supported. | |
JKSFILE/JKSBLOB | base64-only | |
PFXFILE/PFXBLOB | A PKCS12-format (.pfx) file. Must contain both a certificate and a private key. | base64-only |
PEMKEY_FILE/PEMKEY_BLOB | A PEM-format file. Must contain an RSA, DSA, or OPENSSH private key. Can optionally contain a certificate matching the private key. | base64 or plain text. Newlines may be replaced with spaces when providing the blob as text. |
PPKFILE/PPKBLOB | A PuTTY-format private key created using the puttygen tool. | base64-only |
XMLFILE/XMLBLOB | An XML key in the format generated by the .NET RSA class: RSA.ToXmlString(true). | base64 or plain text. |
The SSH user.
The SSH user.
The SSH password.
The SSH password.
This section provides a complete list of the Firewall properties you can configure in the connection string for this provider.
Property | Description |
FirewallType | The protocol used by a proxy-based firewall. |
FirewallServer | The name or IP address of a proxy-based firewall. |
FirewallPort | The TCP port for a proxy-based firewall. |
FirewallUser | The user name to use to authenticate with a proxy-based firewall. |
FirewallPassword | A password used to authenticate to a proxy-based firewall. |
The protocol used by a proxy-based firewall.
This property specifies the protocol that the Sync App will use to tunnel traffic through the FirewallServer proxy. Note that by default, the Sync App connects to the system proxy; to disable this behavior and connect to one of the following proxy types, set ProxyAutoDetect to false.
Type | Default Port | Description |
TUNNEL | 80 | When this is set, the Sync App opens a connection to Parquet and traffic flows back and forth through the proxy. |
SOCKS4 | 1080 | When this is set, the Sync App sends data through the SOCKS 4 proxy specified by FirewallServer and FirewallPort and passes the FirewallUser value to the proxy, which determines if the connection request should be granted. |
SOCKS5 | 1080 | When this is set, the Sync App sends data through the SOCKS 5 proxy specified by FirewallServer and FirewallPort. If your proxy requires authentication, set FirewallUser and FirewallPassword to credentials the proxy recognizes. |
To connect to HTTP proxies, use ProxyServer and ProxyPort. To authenticate to HTTP proxies, use ProxyAuthScheme, ProxyUser, and ProxyPassword.
The name or IP address of a proxy-based firewall.
This property specifies the IP address, DNS name, or host name of a proxy allowing traversal of a firewall. The protocol is specified by FirewallType: Use FirewallServer with this property to connect through SOCKS or do tunneling. Use ProxyServer to connect to an HTTP proxy.
Note that the Sync App uses the system proxy by default. To use a different proxy, set ProxyAutoDetect to false.
The TCP port for a proxy-based firewall.
This specifies the TCP port for a proxy allowing traversal of a firewall. Use FirewallServer to specify the name or IP address. Specify the protocol with FirewallType.
The user name to use to authenticate with a proxy-based firewall.
The FirewallUser and FirewallPassword properties are used to authenticate against the proxy specified in FirewallServer and FirewallPort, following the authentication method specified in FirewallType.
A password used to authenticate to a proxy-based firewall.
This property is passed to the proxy specified by FirewallServer and FirewallPort, following the authentication method specified by FirewallType.
This section provides a complete list of the Proxy properties you can configure in the connection string for this provider.
Property | Description |
ProxyAutoDetect | This indicates whether to use the system proxy settings or not. |
ProxyServer | The hostname or IP address of a proxy to route HTTP traffic through. |
ProxyPort | The TCP port the ProxyServer proxy is running on. |
ProxyAuthScheme | The authentication type to use to authenticate to the ProxyServer proxy. |
ProxyUser | A user name to be used to authenticate to the ProxyServer proxy. |
ProxyPassword | A password to be used to authenticate to the ProxyServer proxy. |
ProxySSLType | The SSL type to use when connecting to the ProxyServer proxy. |
ProxyExceptions | A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer . |
This indicates whether to use the system proxy settings or not.
This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings.
To connect to an HTTP proxy, see ProxyServer. For other proxies, such as SOCKS or tunneling, see FirewallType.
The hostname or IP address of a proxy to route HTTP traffic through.
The hostname or IP address of a proxy to route HTTP traffic through. The Sync App can use the HTTP, Windows (NTLM), or Kerberos authentication types to authenticate to an HTTP proxy.
If you need to connect through a SOCKS proxy or tunnel the connection, see FirewallType.
By default, the Sync App uses the system proxy. If you need to use another proxy, set ProxyAutoDetect to false.
The TCP port the ProxyServer proxy is running on.
The port the HTTP proxy is running on that you want to redirect HTTP traffic through. Specify the HTTP proxy in ProxyServer. For other proxy types, see FirewallType.
The authentication type to use to authenticate to the ProxyServer proxy.
This value specifies the authentication type to use to authenticate to the HTTP proxy specified by ProxyServer and ProxyPort.
Note that the Sync App will use the system proxy settings by default, without further configuration needed; if you want to connect to another proxy, you will need to set ProxyAutoDetect to false, in addition to ProxyServer and ProxyPort. To authenticate, set ProxyAuthScheme and set ProxyUser and ProxyPassword, if needed.
The authentication type can be one of the following:
If you need to use another authentication type, such as SOCKS 5 authentication, see FirewallType.
A user name to be used to authenticate to the ProxyServer proxy.
The ProxyUser and ProxyPassword options are used to connect and authenticate against the HTTP proxy specified in ProxyServer.
You can select one of the available authentication types in ProxyAuthScheme. If you are using HTTP authentication, set this to the user name of a user recognized by the HTTP proxy. If you are using Windows or Kerberos authentication, set this property to a user name in one of the following formats:
user@domain domain\user
A password to be used to authenticate to the ProxyServer proxy.
This property is used to authenticate to an HTTP proxy server that supports NTLM (Windows), Kerberos, or HTTP authentication. To specify the HTTP proxy, you can set ProxyServer and ProxyPort. To specify the authentication type, set ProxyAuthScheme.
If you are using HTTP authentication, additionally set ProxyUser and ProxyPassword to HTTP proxy.
If you are using NTLM authentication, set ProxyUser and ProxyPassword to your Windows password. You may also need these to complete Kerberos authentication.
For SOCKS 5 authentication or tunneling, see FirewallType.
By default, the Sync App uses the system proxy. If you want to connect to another proxy, set ProxyAutoDetect to false.
The SSL type to use when connecting to the ProxyServer proxy.
This property determines when to use SSL for the connection to an HTTP proxy specified by ProxyServer. This value can be AUTO, ALWAYS, NEVER, or TUNNEL. The applicable values are the following:
AUTO | Default setting. If the URL is an HTTPS URL, the Sync App will use the TUNNEL option. If the URL is an HTTP URL, the component will use the NEVER option. |
ALWAYS | The connection is always SSL enabled. |
NEVER | The connection is not SSL enabled. |
TUNNEL | The connection is through a tunneling proxy. The proxy server opens a connection to the remote host and traffic flows back and forth through the proxy. |
A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer .
The ProxyServer is used for all addresses, except for addresses defined in this property. Use semicolons to separate entries.
Note that the Sync App uses the system proxy settings by default, without further configuration needed; if you want to explicitly configure proxy exceptions for this connection, you need to set ProxyAutoDetect = false, and configure ProxyServer and ProxyPort. To authenticate, set ProxyAuthScheme and set ProxyUser and ProxyPassword, if needed.
This section provides a complete list of the Logging properties you can configure in the connection string for this provider.
Property | Description |
LogModules | Core modules to be included in the log file. |
Core modules to be included in the log file.
Only the modules specified (separated by ';') will be included in the log file. By default all modules are included.
See the Logging page for an overview.
This section provides a complete list of the Schema properties you can configure in the connection string for this provider.
Property | Description |
Location | A path to the directory that contains the schema files defining tables, views, and stored procedures. |
BrowsableSchemas | This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC. |
Tables | This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC. |
Views | Restricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC. |
FlattenObjects | Set FlattenObjects to true to flatten object properties into columns of their own. Otherwise, objects nested in arrays are returned as strings of JSON. |
FlattenArrays | By default, nested arrays are returned as strings. The FlattenArrays property can be used to flatten the elements of nested arrays into columns of their own. Set FlattenArrays to the number of elements you want to return from nested arrays. |
A path to the directory that contains the schema files defining tables, views, and stored procedures.
The path to a directory which contains the schema files for the Sync App (.rsd files for tables and views, .rsb files for stored procedures). The folder location can be a relative path from the location of the executable. The Location property is only needed if you want to customize definitions (for example, change a column name, ignore a column, and so on) or extend the data model with new tables, views, or stored procedures.
If left unspecified, the default location is "%APPDATA%\\CData\\Parquet Data Provider\\Schema" with %APPDATA% being set to the user's configuration directory:
Platform | %APPDATA% |
Windows | The value of the APPDATA environment variable |
Linux | ~/.config |
This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC.
Listing the schemas from databases can be expensive. Providing a list of schemas in the connection string improves the performance.
This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC.
Listing the tables from some databases can be expensive. Providing a list of tables in the connection string improves the performance of the Sync App.
This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.
Specify the tables you want in a comma-separated list. Each table should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Tables=TableA,[TableB/WithSlash],WithCatalog.WithSchema.`TableC With Space`.
Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.
Restricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC.
Listing the views from some databases can be expensive. Providing a list of views in the connection string improves the performance of the Sync App.
This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.
Specify the views you want in a comma-separated list. Each view should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Views=ViewA,[ViewB/WithSlash],WithCatalog.WithSchema.`ViewC With Space`.
Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.
Set FlattenObjects to true to flatten object properties into columns of their own. Otherwise, objects nested in arrays are returned as strings of JSON.
Set FlattenObjects to true to flatten object properties into columns of their own. Otherwise, objects nested in arrays are returned as strings of JSON. To generate the column name, the Sync App concatenates the property name onto the object name with a dot.
For example, you can flatten the nested objects below at connection time:
[ { "grade": "A", "score": 2 }, { "grade": "A", "score": 6 }, { "grade": "A", "score": 10 }, { "grade": "A", "score": 9 }, { "grade": "B", "score": 14 } ]When FlattenObjects is set to true and FlattenArrays is set to 1, the preceding array is flattened into the following table:
Column Name | Column Value |
grades.0.grade | A |
grades.0.score | 2 |
By default, nested arrays are returned as strings. The FlattenArrays property can be used to flatten the elements of nested arrays into columns of their own. Set FlattenArrays to the number of elements you want to return from nested arrays.
By default, nested arrays are returned as strings. The FlattenArrays property can be used to flatten the elements of nested arrays into columns of their own. This is only recommended for arrays that are expected to be short.
Set FlattenArrays to the number of elements you want to return from nested arrays. The specified elements are returned as columns. The zero-based index is concatenated to the column name. Other elements are ignored.
For example, you can return an arbitrary number of elements from an array of strings:
["FLOW-MATIC","LISP","COBOL"]When FlattenArrays is set to 1, the preceding array is flattened into the following table:
Column Name | Column Value |
languages.0 | FLOW-MATIC |
This section provides a complete list of the Miscellaneous properties you can configure in the connection string for this provider.
Property | Description |
AggregateFiles | When set to true, the provider will aggregate all the files in URI directory into a single result. With this option enabled, the AggregatedFiles will be exposed which can be used to query the dataset. |
Charset | Specifies the session character set for encoding and decoding character data transferred to and from the Parquet file. The default value is UTF-8. |
ClientCulture | This property can be used to specify the format of data (e.g., currency values) that is accepted by the client application. This property can be used when the client application does not support the machine's culture settings. For example, Microsoft Access requires 'en-US'. |
Compression | Specifies which compression encoding to be used when creating .parquet files using Create Table Statement and Bulk Inserts. |
Culture | This setting can be used to specify culture settings that determine how the provider interprets certain data types that are passed into the provider. For example, setting Culture='de-DE' will output German formats even on an American machine. |
DeleteDownloadedFiles | When set to true, the provider will delete parsed .parquet files downloaded from cloud sources. |
DirectoryRetrievalDepth | Limit the subfolders recursively scanned when IncludeSubdirectories is enabled. |
EnableDictionary | When set to true, the provider will enable dictionary encoding when creating .parquet files using Create Table Statement and Bulk Inserts. |
ExcludeFiles | Comma-separated list of file extensions to exclude from the set of the files modeled as tables. |
FolderId | The ID of a folder in Google Drive. If set, the resource location specified by the URI is relative to the Folder ID for all operations. |
IncludeDropboxTeamResources | Indicates if you want to include Dropbox team files and folders. |
IncludeFiles | Comma-separated list of file extensions to include into the set of the files modeled as tables. |
IncludeItemsFromAllDrives | Whether Google Drive shared drive items should be included in results. If not present or set to false, then shared drive items are not returned. |
IncludeSubdirectories | Whether to read files from nested folders. In the case of a name collision, table names are prefixed by the underscore-separated folder names. |
InsertMode | The behavior when using bulk inserts to create Parquet files. |
MaxRows | Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses. |
MetadataDiscoveryURI | Used when aggregating multiple files into one table, this property specifies a specific file to read to determined the aggregated table schema. |
Other | These hidden properties are used only in specific use cases. |
PageSize | (Optional) PageSize value. |
PathSeparator | Determines the character which will be used to replace the file separator. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
TemporaryLocalFolder | The path, or URI, to the folder that is used to temporarily download parquet file(s). |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
UserDefinedViews | A filepath pointing to the JSON configuration file containing your custom views. |
When set to true, the provider will aggregate all the files in URI directory into a single result. With this option enabled, the AggregatedFiles will be exposed which can be used to query the dataset.
When set to true, the provider will aggregate all the files in URI directory into a single result. With this option enabled, the AggregatedFiles will be exposed which can be used to query the dataset. By default the first file in the folder is used to define the schema, however MetadataDiscoveryURI can be specified to use a different file instead.
Specifies the session character set for encoding and decoding character data transferred to and from the Parquet file. The default value is UTF-8.
Specifies the session character set for encoding and decoding character data transferred to and from the Parquet file. The default value is UTF-8.
This property can be used to specify the format of data (e.g., currency values) that is accepted by the client application. This property can be used when the client application does not support the machine's culture settings. For example, Microsoft Access requires 'en-US'.
This option affects the format of Sync App output. To specify the format that defines how input should be interpreted, use the Culture option. By default the Sync App uses the current locale settings of the machine to interpret input and format output.
Specifies which compression encoding to be used when creating .parquet files using Create Table Statement and Bulk Inserts.
Specifies which compression encoding to be used when creating .parquet files using Create Table Statement and Bulk Inserts.
This setting can be used to specify culture settings that determine how the provider interprets certain data types that are passed into the provider. For example, setting Culture='de-DE' will output German formats even on an American machine.
This property affects the Sync App input. To interpret values in a different cultural format, use the Client Culture property. By default the Sync App uses the current locale settings of the machine to interpret input and format output.
When set to true, the provider will delete parsed .parquet files downloaded from cloud sources.
When set to true, the provider will delete parsed .parquet files downloaded from cloud sources, stored in directory specified through connection property 'TemporaryLocalFolder'.
Limit the subfolders recursively scanned when IncludeSubdirectories is enabled.
When IncludeSubdirectories is enabled, DirectoryRetrievalDepth specifies how many subfolders will be recursively scanned before stopping. -1 specifies that all subfolders are scanned.
When set to true, the provider will enable dictionary encoding when creating .parquet files using Create Table Statement and Bulk Inserts.
When set to true, the provider will enable dictionary encoding when creating .parquet files using Create Table Statement and Bulk Inserts. Using dictionary encoding can help reduce size of the resultant .parquet file if a column has repeated values. Columns with all unique values will not use dictionary encoding even if this property is set to true.
Comma-separated list of file extensions to exclude from the set of the files modeled as tables.
It is also possible to specify datetime filters. We currently support CreatedDate and ModifiedDate. All extension filters are evaluated in disjunction (using OR operator), and then the resulting filter is evaluated in conjunction (using AND operator) with the datetime filters.
Examples:
ExcludeFiles="TXT,CreatedDate<='2020-11-26T07:39:34-05:00'"
ExcludeFiles="TXT,ModifiedDate<=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 50, 000)"
ExcludeFiles="ModifiedDate>=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 49, 000),ModifiedDate<=CURRENT_TIMESTAMP()"
The ID of a folder in Google Drive. If set, the resource location specified by the URI is relative to the Folder ID for all operations.
The ID of a folder in Google Drive. If set, the resource location specified by the URI is relative to the Folder ID for all operations.
Indicates if you want to include Dropbox team files and folders.
In order to access Dropbox team folders and files, please set this connection property to True.
Comma-separated list of file extensions to include into the set of the files modeled as tables.
Comma-separated list of file extensions to include into the set of the files modeled as tables. For example, IncludeFiles=parquet,TXT. The default is parquet.
A '*' value can be specified to include all files. A 'NOEXT' value can be specified to include files without an extension.
It is also possible to specify datetime filters. We currently support CreatedDate and ModifiedDate. All extension filters are evaluated in disjunction (using OR operator), and then the resulting filter is evaluated in conjunction (using AND operator) with the datetime filters.
Examples:
IncludeFiles="TXT,CreatedDate<='2020-11-26T07:39:34-05:00'"
IncludeFiles="TXT,ModifiedDate<=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 50, 000)"
IncludeFiles="ModifiedDate>=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 49, 000),ModifiedDate<=CURRENT_TIMESTAMP()"
Whether Google Drive shared drive items should be included in results. If not present or set to false, then shared drive items are not returned.
If this property is set to 'True', files will be retrieved from all drives, including shared drives. The file retrieval can be limited a specific shared drive or a specific folder in that shared drive by setting the start of the URI to the path of the shared drive and optionally any folder within, for example: 'gdrive://SharedDriveA/FolderA/...'. Additionally, the FolderId property can be used to limit the search to an exact subdirectory.
Whether to read files from nested folders. In the case of a name collision, table names are prefixed by the underscore-separated folder names.
Whether to read files from nested folders. Table names are prefixed by each nested folder name separated by underscores. For example,
Root\subfolder1\tableA | Root\subfolder1\subfolder2\tableA |
subfolder1_tableA | subfolder1_subfolder2_tableA |
The behavior when using bulk inserts to create Parquet files.
Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.
Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.
Used when aggregating multiple files into one table, this property specifies a specific file to read to determined the aggregated table schema.
Used when aggregating multiple files into one table, this property specifies a specific file to read to determined the aggregated table schema.
These hidden properties are used only in specific use cases.
The properties listed below are available for specific use cases. Normal driver use cases and functionality should not require these properties.
Specify multiple properties in a semicolon-separated list.
DefaultColumnSize | Sets the default length of string fields when the data source does not provide column length in the metadata. The default value is 2000. |
ConvertDateTimeToGMT | Determines whether to convert date-time values to GMT, instead of the local time of the machine. |
RecordToFile=filename | Records the underlying socket data transfer to the specified file. |
(Optional) PageSize value.
The PageSize value is used to specify number of rows to fetch at a time.
Determines the character which will be used to replace the file separator.
Determines the character which will be used to replace the file separator. If there is a parquet file located in "Test/Files/Test.parquet" and if this property is set to "_", then the table name for this file would be "Test_Files_Test.parquet".
Note: Backslash (\) cannot be used be a value for this property.
This property indicates whether or not to include pseudo columns as columns to the table.
This setting is particularly helpful in Entity Framework, which does not allow you to set a value for a pseudo column unless it is a table column. The value of this connection setting is of the format "Table1=Column1, Table1=Column2, Table2=Column3". You can use the "*" character to include all tables and all columns; for example, "*=*".
The path, or URI, to the folder that is used to temporarily download parquet file(s).
The path, or URI, to the folder that is used to temporarily download parquet file(s) from cloud sources like S3, Azure etc. For instance: TemporaryLocalFolder='C:/User/Download'. The downloaded files are by default deleted automatically after parsing, this behavior can be changed using connection property 'DeleteDownloadedFiles'
The value in seconds until the timeout error is thrown, canceling the operation.
If Timeout = 0, operations do not time out. The operations run until they complete successfully or until they encounter an error condition.
If Timeout expires and the operation is not yet complete, the Sync App throws an exception.
A filepath pointing to the JSON configuration file containing your custom views.
User Defined Views are defined in a JSON-formatted configuration file called UserDefinedViews.json. The Sync App automatically detects the views specified in this file.
You can also have multiple view definitions and control them using the UserDefinedViews connection property. When you use this property, only the specified views are seen by the Sync App.
This User Defined View configuration file is formatted as follows:
For example:
{ "MyView": { "query": "SELECT * FROM SampleTable_1 WHERE MyColumn = 'value'" }, "MyView2": { "query": "SELECT * FROM MyTable WHERE Id IN (1,2,3)" } }Use the UserDefinedViews connection property to specify the location of your JSON configuration file. For example:
"UserDefinedViews", C:\Users\yourusername\Desktop\tmp\UserDefinedViews.jsonNote that the specified path is not embedded in quotation marks.