The CData Sync App provides a straightforward way to continuously pipeline your Avro data to any database, data lake, or data warehouse, making it easily available for Analytics, Reporting, AI, and Machine Learning.
The Avro connector can be used from the CData Sync application to pull data from Avro and move it to any of the supported destinations.
The CData Sync App is designed for streaming Avro only.
This streamed file content does not include all of the metadata associated with remotely stored Avro files, such as file and folder name.
If access to both the file metadata and the actual file content is needed, then the CData Sync App must be used in tandem with the associated file system driver(s) for the service the Avro files are remotely stored in.
The following file system drivers are available:
See the relevant CData file system driver's documentation for a configuration guide for connecting to stored Avro file metadata.
Create a connection to Avro by navigating to the Connections page in the Sync App application and selecting the corresponding icon in the Add Connections panel. If the Avro icon is not available, click the Add More icon to download and install the Avro connector from the CData site.
Required properties are listed under the Settings tab. The Advanced tab lists connection properties that are not typically required.
The CData Sync App allows connecting to local and remote Avro resources. Set the URI property to the Avro resource location, in addition to any other properties necessary to connect to your data source.
Set the ConnectionType to Local. Local files support SELECT.
Set the URI to a folder containing Avro files: C:\folder1.
If you need INSERT/UPDATE/DELETE cloud files, you can download the corresponding CData Sync App for that cloud host (supported via stored procedures), make changes with the local file's corresponding Sync App, then upload the file using the cloud source's stored procedures.
As an example, if you wanted to update a file stored on SharePoint, you could use the CData SharePoint Sync App's DownloadDocument procedure to download the Avro file, update the local Avro file with the CData Avro Sync App, then use the SharePoint Sync App's UploadDocument procedure to upload the changed file to SharePoint.
A unique prefix at the beginning of the URI connection property is used to identify the cloud data store being targed by the Sync App and the remainder of the path is a relative path to the desired folder (one table per file) or single file (a single table).
Set the following to identify your Avro resources stored on Amazon S3:
See Connecting to Amazon S3 for more information regarding how to connect and authenticate to Avro files hosted on Amazon S3.
Set the following to identify your Avro resources stored on Azure Blob Storage:
See Connecting to Azure Blob Storage for more information regarding how to connect and authenticate to Avro files hosted on Amazon Blob Storage.
Set the following to identify your Avro resources stored on Azure Data Lake Storage:
See Connecting to Azure Data Lake Storage for more information regarding how to connect and authenticate to Avro files hosted on Azure Data Lake Storage.
Set the following properties to connect:
You can authenticate either an Azure access key or an Azure shared access signature. Set one of the following:
Set the following to identify your Avro resources stored on Box:
See Connecting to Box for more information regarding how to connect and authenticate to Avro files hosted on Box.
Set the following to identify your Avro resources stored on Dropbox:
See Connecting to Dropbox for more information regarding how to connect and authenticate to Avro files hosted on Dropbox.
The Sync App supports both plaintext and SSL/TLS connections to FTP servers.
Set the following connection properties to connect:
Set the following to identify your Avro resources stored on Google Cloud Storage:
See Connecting to Google Cloud Storage for more information regarding how to connect and authenticate to Avro files hosted on Google Cloud Storage.
Set the following to identify your Avro resources stored on Google Drive:
See Connecting to Google Drive for more information regarding how to connect and authenticate to Avro files hosted on Google Drive.
Set the following to identify your Avro resources stored on HDFS:
There are two authentication methods available for connecting to HDFS data source, Anonymous Authentication and Negotiate (Kerberos) Authentication.
Anonymous Authentication
In some situations, you can connect to HDFS without any authentication connection properties. To do so, set the AuthScheme property to None (default).
Authenticate using Kerberos
When authentication credentials are required, you can use Kerberos for authentication. See Using Kerberos for details on how to authenticate with Kerberos.
Set the following to identify your Avro resources stored on HTTP streams:
See Connecting to HTTP Streams for more information regarding how to connect and authenticate to Avro files hosted on HTTP Streams.
Set the following to identify your Avro resources stored on IBM Cloud Object Storage:
See Connecting to IBM Object Storage for more information regarding how to connect and authenticate to Avro files hosted on IBM Cloud Object Storage.
Set the following to identify your Avro resources stored on OneDrive:
See Connecting to OneDrive for more information regarding how to connect and authenticate to Avro files hosted on OneDrive.
Set the following properties to authenticate with HMAC:
Set the following to identify your Avro resources stored on SFTP:
See Connecting to SFTP for more information regarding how to connect and authenticate to Avro files hosted on SFTP.
Set the following to identify your Avro resources stored on SharePoint Online:
See Connecting to SharePoint Online for more information regarding how to connect and authenticate to Avro files hosted on SharePoint Online.
To obtain the credentials for an IAM user, follow the steps below:
To obtain the credentials for your AWS root account, follow the steps below:
Specify the following to connect to data:
There are several authentication methods available for connecting to Avro including:
To authenticate using account root credentials, set the following:
Note: Use of this authentication scheme is discouraged by Amazon for anything but simple tests. The account root credentials have the full permissions of the user, making this the least secure authentication method.
If you are using the Sync App from an EC2 Instance and have an IAM Role assigned to the instance, you can use the IAM Role to authenticate. To do so, set the following properties to authenticate:
If you are also using an IAM role to authenticate, you must additionally specify the following:
IMDSv2 Support
The Avro Sync App now supports IMDSv2. Unlike IMDSv1, the new version requires an authentication token. Endpoints and response are the same in both versions. In IMDSv2, the Avro Sync App first attempts to retrieve the IMDSv2 metadata token and then uses it to call AWS metadata endpoints. If it is unable to retrieve the token, the Sync App reverts to IMDSv1.
In many situations it may be preferable to use an IAM role for authentication instead of the direct security credentials of an AWS root user.
To authenticate as an AWS role, set the following:
Note: Roles may not be used when specifying the AWSAccessKey and AWSSecretKey of an AWS root user.
Set the AuthScheme to ADFS. The following connection properties need to be set:
AuthScheme=ADFS;User=username;Password=password;SSOLoginURL='https://sts.company.com';
ADFS Integrated
To use the ADFS Integrated flow, specify the SSOLoginURL and leave the username and password empty.
Set the AuthScheme to Okta. The following connection properties are used to authenticate through Okta:
then you need to use combinations of SSOProperties input parameters to authenticate using Okta. Otherwise, you do not need to set any of these values.
In SSOProperties when required, set these input parameters:
Example connection string:
AuthScheme=Okta;SSOLoginURL='https://example.okta.com/home/appType/0bg4ivz6cJRZgCz5d6/46';User=oktaUserName;Password=oktaPassword;
Set the AuthScheme to PingFederate. The following connection properties need to be set:
authScheme=pingfederate;SSOLoginURL=https://mycustomserver.com:9033/idp/sts.wst;SSOExchangeUrl=https://us-east-1.signin.aws.amazon.com/platform/saml/acs/764ef411-xxxxxx;user=admin;password=PassValue;AWSPrincipalARN=arn:aws:iam::215338515180:saml-provider/pingFederate;AWSRoleArn=arn:aws:iam::215338515180:role/SSOTest2;
For users and roles that require Multi-factor Authentication, specify the following to authenticate:
Note that you can control the duration of the temporary credentials by setting the TemporaryTokenDuration property (default 3600 seconds).
To authenticate using temporary credentials, specify the following:
The Sync App can now request resources using the same permissions provided by long-term credentials (such as IAM user credentials) for the lifespan of the temporary credentials.
If you are also using an IAM role to authenticate, you must additionally specify the following:
You can use a credentials file to authenticate. Any configurations related to AccessKey/SecretKey authentication, temporary credentials, role authentication, or MFA can be used. To do so, set the following properties to authenticate:
To obtain the credentials for an AzureBlob user, follow the steps below:
Set the AzureAccessKey connection property to the access key associated with the Azure blob to identify the user.
You can authenticate to Azure Blob Storage as an Azure AD user, with MSI authentication, or using an Azure Service Principal.
You can authenticate an Azure AD account using either an Azure Access Key or OAuth authentication.
Method 1: Storage Account and Access Key
Set the following to authenticate with an Azure Access Key:
Method 2: OAuth
Set the following to authenticate with OAuth:
If you are connecting from an Azure VM with permissions for Azure Blob storage, set the following:
If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate. Set the following to authenticate:
You can authenticate to Azure Data Lake Storage as an Azure AD user, with MSI authentication, or using an Azure Service Principal.
You can authenticate an Azure AD account using either an Azure Access Key or OAuth authentication.
Method 1: Storage Account and Access Key
Set the following to authenticate with an Azure Access Key:
Method 2: OAuth
Set the following to authenticate with OAuth:
If you are connecting from an Azure VM with permissions to connect to Azure Data Lake Storage, set the following:
If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate.
Use the OAuth authentication standard to connect to Box. You can authenticate with a user account or with a service account. A service account is required to grant organization-wide access scopes to the Sync App. The Sync App facilitates these authentication flows as described below.
AuthScheme must be set to OAuth in all user account flows.
When connecting via a Web application, you need to register a custom OAuth app with Box. You can then use the Sync App to get and manage the OAuth token values. See Create a Custom OAuth App for more information.
Get an OAuth Access Token
Set the following connection properties to obtain the OAuthAccessToken:
Then call stored procedures to complete the OAuth exchange:
Call the GetOAuthAuthorizationURL stored procedure. Set the CallbackURL input to the Redirect URI you specified in your app settings. The stored procedure returns the URL to the OAuth endpoint.
After you have obtained the access and refresh tokens, you can connect to data and refresh the OAuth access token either automatically or manually.
Automatic Refresh of the OAuth Access Token
To have the Sync App automatically refresh the OAuth access token, set the following on the first data connection.
Manual Refresh of the OAuth Access Token
The only value needed to manually refresh the OAuth access token when connecting to data is the OAuth refresh token. Use the RefreshOAuthAccessToken stored procedure to manually refresh the OAuthAccessToken after the ExpiresIn parameter value returned by GetOAuthAccessToken has elapsed, then set the following connection properties:
Then call RefreshOAuthAccessToken with OAuthRefreshToken set to the OAuth refresh token returned by GetOAuthAccessToken. After the new tokens have been retrieved, open a new connection by setting the OAuthAccessToken property to the value returned by RefreshOAuthAccessToken.
Finally, store the OAuth refresh token so that you can use it to manually refresh the OAuth access token after it has expired.
Set the AuthScheme to OAuthJWT to authenticate with this method.
Service accounts have silent authentication, without user authentication in the browser. You can also use a service account to delegate enterprise-wide access scopes to the Sync App.
You need to create an OAuth application in this flow. See Create a Custom OAuth App to create and authorize an app. You can then connect to Box data that the service account has permission to access.
After setting the following connection properties, you are ready to connect:
You may choose to use your own OAuth Application Credentials when you want to
Follow the steps below to create an OAuth application and generate a private key. You will then authorize the service account.
Authorize the application in the enterprise admin console: Navigate to Apps > Custom Apps Manager > Add App. In the Add App modal window, provide the Client Id and click Next to identify and verify the app.
Note: If you change the JWT access scopes, you will need to reauthorize the application in the enterprise admin console: Click Apps in the main menu and then select the ellipsis button next to your JWT application name. Select Reauthorize App in the menu.
Dropbox uses the OAuth authentication standard.
You need to choose between using CData's embedded OAuth app or Create a Custom OAuth App.
TODO -- PUT NEEDED SCOPES HERE
When connecting via a Web application, you need to register a custom OAuth app with Dropbox. You can then use the Sync App to get and manage the OAuth token values. See Create a Custom OAuth App for more information.
Get an OAuth Access Token
Set the following connection properties to obtain the OAuthAccessToken:
Then call stored procedures to complete the OAuth exchange:
Call the GetOAuthAuthorizationURL stored procedure. Set the CallbackURL input to the Redirect URI you specified in your app settings. The stored procedure returns the URL to the OAuth endpoint.
After you have obtained the access and refresh tokens, you can connect to data and refresh the OAuth access token either automatically or manually.
Automatic Refresh of the OAuth Access Token
To have the Sync App automatically refresh the OAuth access token, set the following on the first data connection.
Manual Refresh of the OAuth Access Token
The only value needed to manually refresh the OAuth access token when connecting to data is the OAuth refresh token. Use the RefreshOAuthAccessToken stored procedure to manually refresh the OAuthAccessToken after the ExpiresIn parameter value returned by GetOAuthAccessToken has elapsed, then set the following connection properties:
Then call RefreshOAuthAccessToken with OAuthRefreshToken set to the OAuth refresh token returned by GetOAuthAccessToken. After the new tokens have been retrieved, open a new connection by setting the OAuthAccessToken property to the value returned by RefreshOAuthAccessToken.
Finally, store the OAuth refresh token so that you can use it to manually refresh the OAuth access token after it has expired.
You may choose to use your own OAuth Application Credentials when you want to
No further values need to be specified in the Avro app settings.
Set the ProjectId property to the Id of the project you want to connect to.
The Sync App supports using user accounts and GCP instance accounts for authentication.
The following sections discuss the available authentication schemes for Google Cloud Storage:
AuthScheme must be set to OAuth in all user account flows.
CData provides an embedded OAuth application that simplifies OAuth desktop Authentication. Alternatively, you can create a custom OAuth application. See Create a Custom OAuth App for information about creating custom applications and reasons for doing so.
For authentication, the only difference between the two methods is that you must set two additional connection properties when using custom OAuth applications.
After setting the following connection properties, you are ready to connect:
Get an OAuth Access Token
Set the following connection properties to obtain the OAuthAccessToken:
Then call stored procedures to complete the OAuth exchange:
Once you have obtained the access and refresh tokens, you can connect to data and refresh the OAuth access token either automatically or manually.
Automatic Refresh of the OAuth Access Token
To have the driver automatically refresh the OAuth access token, set the following on the first data connection:
Manual Refresh of the OAuth Access Token
The only value needed to manually refresh the OAuth access token when connecting to data is the OAuth refresh token.
Use the RefreshOAuthAccessToken stored procedure to manually refresh the OAuthAccessToken after the ExpiresIn parameter value returned by GetOAuthAccessToken has elapsed, then set the following connection properties:
Then call RefreshOAuthAccessToken with OAuthRefreshToken set to the OAuth refresh token returned by GetOAuthAccessToken. After the new tokens have been retrieved, open a new connection by setting the OAuthAccessToken property to the value returned by RefreshOAuthAccessToken.
Finally, store the OAuth refresh token so that you can use it to manually refresh the OAuth access token after it has expired.
Option 1: Obtain and Exchange a Verifier Code
To obtain a verifier code, you must authenticate at the OAuth authorization URL.
Follow the steps below to authenticate from the machine with an internet browser and obtain the OAuthVerifier connection property.
On the headless machine, set the following connection properties to obtain the OAuth authentication values:
After the OAuth settings file is generated, you need to re-set the following properties to connect:
Option 2: Transfer OAuth Settings
Prior to connecting on a headless machine, you need to create and install a connection with the driver on a device that supports an internet browser. Set the connection properties as described in "Desktop Applications" above.
After completing the instructions in "Desktop Applications", the resulting authentication values are encrypted and written to the path specified by OAuthSettingsLocation. The default filename is OAuthSettings.txt.
Once you have successfully tested the connection, copy the OAuth settings file to your headless machine.
On the headless machine, set the following connection properties to connect to data:
When running on a GCP virtual machine, the Sync App can authenticate using a service account tied to the virtual machine. To use this mode, set AuthScheme to GCPInstanceAccount.
You may choose to use your own OAuth Application Credentials when you want to
Follow these steps to create a custom OAuth application:
The Sync App supports using user accounts and GCP instance accounts for authentication.
The following sections discuss the available authentication schemes for Google Drive:
AuthScheme must be set to OAuth in all user account flows.
Get an OAuth Access Token
Set the following connection properties to obtain the OAuthAccessToken:
Then call stored procedures to complete the OAuth exchange:
Once you have obtained the access and refresh tokens, you can connect to data and refresh the OAuth access token either automatically or manually.
Automatic Refresh of the OAuth Access Token
To have the driver automatically refresh the OAuth access token, set the following on the first data connection:
Manual Refresh of the OAuth Access Token
The only value needed to manually refresh the OAuth access token when connecting to data is the OAuth refresh token.
Use the RefreshOAuthAccessToken stored procedure to manually refresh the OAuthAccessToken after the ExpiresIn parameter value returned by GetOAuthAccessToken has elapsed, then set the following connection properties:
Then call RefreshOAuthAccessToken with OAuthRefreshToken set to the OAuth refresh token returned by GetOAuthAccessToken. After the new tokens have been retrieved, open a new connection by setting the OAuthAccessToken property to the value returned by RefreshOAuthAccessToken.
Finally, store the OAuth refresh token so that you can use it to manually refresh the OAuth access token after it has expired.
When running on a GCP virtual machine, the Sync App can authenticate using a service account tied to the virtual machine. To use this mode, set AuthScheme to GCPInstanceAccount.
The Sync App facilitates the following OAuth authentication flows:
This OAuth flow requires the authenticating user to interact with Google using the browser. The Sync App facilitates this in various ways as described below.
When you connect the Sync App opens the OAuth endpoint in your default browser. Log in and grant permissions to the application. The Sync App then completes the OAuth process:
When connecting via a Web application or if the Sync App is not authorized to open a browser window, you exchange a verifier code for the access token.
To begin, you need to register an OAuth app with Google and set the following connection properties.
Once you have registered an app and set OAuthClientId and OAuthClientSecret you can exchange a verifier code for the access token.
Log in at the OAuth endpoint and authorize the application. You are redirected back via the callback URL.
The verifier code is appended to the callback URL in a query string parameter named "code". Extract the verifier code.
You can use a service account in this OAuth flow to access Google APIs on behalf of users in a domain. A domain administrator can delegate domain-wide access to the service account.
To complete the service account flow, generate a private key in the Google APIs Console. In the service account flow, the Sync App exchanges a JSON Web token (JWT) for the OAuthAccessToken. The private key is required to sign the JWT. The OAuthAccessToken authenticates that the Sync App has the same permissions granted to the service account.
Follow the steps below to generate a private key and obtain the credentials for your application:
After setting the following connection properties, you are ready to connect:
AuthScheme: Set this to Basic.
AuthScheme: Set this to Digest.
AuthScheme: Set this to OAuth.
AuthScheme: Set this to OAuthJWT.
AuthScheme: Set this to OAuthPassword.
AuthScheme: Set this to OAuthClient.
AuthScheme: Set this to OAuthPKCE.
If you do not already have Cloud Object Storage in your IBM Cloud account, you can follow the procedure below to install an instance of SQL Query in your account:
To connect with IBM Cloud Object Storage, you will need an ApiKey. You can obtain this as follows:
You can authenticate to IBM Cloud Object Storage using either HMAC or OAuth authentication.
Set the following properties to authenticate:
ConnectionType=IBM Object Storage Source;URI=ibmobjectstorage://bucket1/folder1; AccessKey=token1; SecretKey=secret1; Region=eu-gb;Optionally, specify Region in addition.
Set the following to authenticate using OAuth authentication.
ConnectionType=IBM Object Storage Source;URI=ibmobjectstorage://bucket1/folder1; ApiKey=key1; Region=eu-gb; AuthScheme=OAuth; InitiateOAuth=GETANDREFRESH;
When you connect, the Sync App completes the OAuth process.
Avro uses the OAuth authentication standard. To authenticate using OAuth, you will need to create an app to obtain the OAuthClientId, OAuthClientSecret, and CallbackURL connection properties. AZUREOAUTHentazuread AZUREOAUTHentazureserviceprincipalinfo AZUREOAUTHentmsiauth
SSHAuthMode: Set this to None.
SSHAuthMode: Set this to Password.
SSHAuthMode: Set this to Public_Key.
Service provider | Okta | OneLogin | ADFS | AzureAD |
Amazon S3 | Y | Y | Y | |
Azure Blob Storage | ||||
Azure Data Lake Store Gen1 | ||||
Azure Data Lake Store Gen2 | ||||
Azure Data Lake Store Gen2 with SSL | ||||
Google Drive | ||||
OneDrive | ||||
Box | ||||
Dropbox | ||||
SharePoint Online SOAP | Y | Y | Y | |
SharePoint Online REST | ||||
Wasabi | ||||
Google Cloud Storage | ||||
Oracle Cloud Storage | ||||
Azure File |
Azure AD Configuration
The main theme behind this configuration is the OAuth 2.0 On-Behalf-Of flow. It requires two Azure AD applications:
Save the step "Assign the Azure AD test user" until after provisioning so that you can select the AWS roles when assigning the user.
CData Driver Common Properties
The following SSOProperties are needed to authenticate to Azure Active Directory and must be specified for every service provider.
We will retrieve the SSO SAML response from an OAuth 2.0 On-Behalf-Of flow so the following OAuth connection properties must be specified:
Amazon S3
In addition to the common properties, the following properties must be specified when connecting to Amazon S3 service provider:
AuthScheme=AzureAD;InitiateOAuth=GETANDREFRESH;OAuthClientId=d593a1d-ad89-4457-872d-8d7443aaa655;OauthClientSecret=g9-oy5D_rl9YEKfN-45~3Wm8FgVa2F;SSOProperties='Tenant=94be7-edb4-4fda-ab12-95bfc22b232f;Resource=https://signin.aws.amazon.com/saml;';AWSRoleARN=arn:aws:iam::2153385180:role/AWS_AzureAD;AWSPrincipalARN=arn:aws:iam::215515180:saml-provider/AzureAD;
OneLogin Configuration
You must create an application used for the single sign-on process to a specific provider.
Sharepoint SOAP
The following properties must be specified when connecting to Sharepoint SOAP service provider:
AuthScheme='OneLogin';User=test;Password=test;SSOProperties='Domain=test.cdata;';
Okta Configuration
You must create an application used for the single sign-on process to a specific provider.
Sharepoint SOAP
The following properties must be specified when connecting to Sharepoint SOAP service provider:
AuthScheme='Okta';User=test;Password=test;SSOProperties='Domain=test.cdata;';
Amazon S3
The following properties must be specified when connecting to an Amazon S3 service provider:
AuthScheme=Okta;User=OktaUser;Password=OktaPassword;SSOLoginURL='https://{subdomain}.okta.com/home/amazon_aws/0oan2hZLgQiy5d6/272';
ADFS Configuration
You must create an application used for the single sign-on process to a specific provider.
Sharepoint SOAP
The following properties must be specified when connecting to a Sharepoint SOAP service provider:
AuthScheme='ADFS';User=test;Password=test;SSOProperties='Domain=test.cdata;';
Amazon S3
The following properties must be specified when connecting to a Sharepoint SOAP service provider:
AuthScheme=ADFS;User=username;Password=password;SSOLoginURL='https://sts.company.com';ADFS Integrated
The ADFS Integrated flow indicates you are connecting with the currently logged in Windows user credentials. To use the ADFS Integrated flow, simply do not specify the User and Password, but otherwise follow the same steps in the ADFS guide above.
This section shows how to use the Sync App to authenticate using Kerberos.
To authenticate to Avro using Kerberos, set the following properties:
You can use one of the following options to retrieve the required Kerberos ticket.
This option enables you to use the MIT Kerberos Ticket Manager or kinit command to get tickets. Note that you do not need to set the User or Password connection properties with this option.
As an alternative to setting the KRB5CCNAME environment variable, you can directly set the file path using the KerberosTicketCache property. When set, the Sync App uses the specified cache file to obtain the Kerberos ticket to connect to Avro.
If the KRB5CCNAME environment variable has not been set, you can retrieve a Kerberos ticket using a Keytab File. To do so, set the User property to the desired username and set the KerberosKeytabFile property to a file path pointing to the keytab file associated with the user.
If both the KRB5CCNAME environment variable and the KerberosKeytabFile property have not been set, you can retrieve a ticket using a user and password combination. To do this, set the User and Password properties to the user/password combination that you use to authenticate with Avro.
More complex Kerberos environments may require cross-realm authentication where multiple realms and KDC servers are used (e.g., where one realm/KDC is used for user authentication and another realm/KDC is used for obtaining the service ticket).
In such an environment, set the KerberosRealm and KerberosKDC properties to the values required for user authentication. Also set the KerberosServiceRealm and KerberosServiceKDC properties to the values required to obtain the service ticket.
In this section we will show how to control the various schemes that the Sync App offers to bridge the gap with relational SQL and nested Avro services. The CData Sync App provides a managed way for you to use the two prevailing techniques for dealing with nested Avro data:
By default, the Sync App automatically detects the rows in a document, so that you do not need to know the structure of the underlying data to query it with SQL. Set the DataModel property to choose a basic configuration of how the Sync App models object arrays into tables. Set the FlattenObjects and FlattenArrays properties to configure how nested data is flattened into columns. See Parsing Hierarchical Data for a guide.
Below is the raw data used throughout this chapter. The data includes entries for people, the cars they own, and various maintenance services performed on those cars:
{ "type": "record", "name": "People", "fields": [ { "name": "personal", "type": [ "null", { "type": "record", "name": "Personal", "namespace": "root", "fields": [ { "name": "name", "type": [ "null", { "type": "record", "name": "Name", "namespace": "root.personal", "fields": [ { "name": "last", "type": ["null", "string"] }, { "name": "first", "type": ["null", "string"] } ] } ] }, { "name": "gender", "type": ["null", "string"] }, { "name": "age", "type": ["null", "long"] } ] } ] }, { "name": "vehicles", "type": { "type": "array", "items": { "type": "record", "name": "Vehicles", "namespace": "root", "fields": [ { "name": "insurance", "type": [ "null", { "type": "record", "name": "Insurance", "namespace": "root.vehicles", "fields": [ { "name": "policy_num", "type": ["null", "long"] }, { "name": "company", "type": ["null", "string"] } ] } ] }, { "name": "maintenance", "type": { "type": "array", "items": { "type": "record", "name": "Maintenance", "namespace": "root.vehicles", "fields": [ { "name": "desc", "type": ["null", "string"] }, { "name": "date", "type": ["null", "string"] } ] } } }, { "name": "model", "type": ["null", "string"] }, { "name": "type", "type": ["null", "string"] } ] } } }, { "name": "source", "type": ["null", "string"] } ] }
The following is the sample data set for the "People" table:
{ "people": [ { "personal": { "age": 20, "gender": "M", "name": { "first": "John", "last": "Doe" } }, "vehicles": [ { "type": "car", "model": "Honda Civic", "insurance": { "company": "ABC Insurance", "policy_num": "12345" }, "maintenance": [ { "date": "07-17-2017", "desc": "oil change" }, { "date": "01-03-2018", "desc": "new tires" } ] }, { "type": "truck", "model": "Dodge Ram", "insurance": { "company": "ABC Insurance", "policy_num": "12345" }, "maintenance": [ { "date": "08-27-2017", "desc": "new tires" }, { "date": "01-08-2018", "desc": "oil change" } ] } ], "source": "internet" }, { "personal": { "age": 24, "gender": "F", "name": { "first": "Jane", "last": "Roberts" } }, "vehicles": [ { "type": "car", "model": "Toyota Camry", "insurance": { "company": "Car Insurance", "policy_num": "98765" }, "maintenance": [ { "date": "05-11-2017", "desc": "tires rotated" }, { "date": "11-03-2017", "desc": "oil change" } ] }, { "type": "car", "model": "Honda Accord", "insurance": { "company": "Car Insurance", "policy_num": "98765" }, "maintenance": [ { "date": "10-07-2017", "desc": "new air filter" }, { "date": "01-13-2018", "desc": "new brakes" } ] } ], "source": "phone" } ] }
By default, the Sync App automatically infers a relational schema by inspecting the Avro data. This section describes the connection properties available to configure these dynamic schemas.
The columns identified during the discovery process depend on the FlattenArrays and FlattenObjects properties. If FlattenObjects is set (this is the default), nested objects will be flattened into a series of columns.
To provide an example of how these options work, consider the following schema:
{ "type" : "record", "name" : "Root", "fields" : [ { "name" : "id", "type" : [ "null", "long" ] }, { "name" : "name", "type" : [ "null", "string" ] }, { "name" : "annual_revenue", "type" : [ "null", "long" ] }, { "name" : "offices", "type" : { "type" : "array", "items" : "string" } }, { "name" : "address", "type" : [ "null", { "type" : "record", "name" : "Address", "namespace" : "root", "fields" : [ { "name" : "city", "type" : [ "null", "string" ] }, { "name" : "state", "type" : [ "null", "string" ] }, { "name" : "street", "type" : [ "null", "string" ] } ] } ] }] }
Also consider the following example data for the above schema:
{ "id": 12, "name": "Lohia Manufacturers Inc.", "annual_revenue": 35600000, "offices": [ "Chapel Hill", "London", "New York" ], "address": { "city": "Chapel Hill", "state": "NC", "street": "Main Street" } }
If FlattenObjects is set, all nested objects will be flattened into a series of columns. The above example will be represented by the following columns:
Column Name | Data Type | Example Value |
id | Integer | 12 |
name | String | Lohia Manufacturers Inc. |
address.street | String | Main Street |
address.city | String | Chapel Hill |
address.state | String | NC |
offices | String | ["Chapel Hill", "London", "New York"] |
annual_revenue | Double | 35,600,000 |
If FlattenObjects is not set, then the address.street, address.city, and address.state columns will not be broken apart. The address column of type string will instead represent the entire object. Its value would be the following:
{street: "Main Street", city: "Chapel Hill", state: "NC"}
The FlattenArrays property can be used to flatten array values into columns of their own. This is only recommended for arrays that are expected to be short, for example the coordinates below:
"coord": [ -73.856077, 40.848447 ]The FlattenArrays property can be set to 2 to represent the array above as follows:
Column Name | Data Type | Example Value |
coord.0 | Float | -73.856077 |
coord.1 | Float | 40.848447 |
The Sync App offers three basic configurations to model object arrays as tables, described in the following sections. The Sync App will parse the document and identify the object arrays.
For users who simply need access to the entirety of their Avro data, flattening the data into a single table is the best option. The Sync App will use streaming and only parses the data once per query in this mode.
With DataModel set to "FlattenedDocuments" values will act in the same manner as a SQL JOIN. Any nested sibling values (child paths at the same height) will be treated as a SQL CROSS JOIN.
Below is a sample query and the results, based on the sample document in Raw Data. This implicitly JOINs the people collection with the vehicles collection and implicitly JOINs the vehicles collection with the maintenance collection.
Use the following connection string to query the Raw Data in this example.
URI=C:\people.avro;DataModel=FlattenedDocuments;
The following query drills into the nested elements in each people object.
SELECT
[personal.age] AS age,
[personal.gender] AS gender,
[personal.name.first] AS name_first,
[personal.name.last] AS name_last,
[source],
[type],
[model],
[insurance.company] AS ins_company,
[insurance.policy_num] AS ins_policy_num,
[date] AS maint_date,
[desc] AS maint_desc
FROM
[people]
With horizontal and vertical flattening based on the described paths, each vehicle object is implicitly JOINed to its parent people object and each maintenance object is implicitly JOINed to its parent vehicle object.
age | gender | first_name | last_name | source | type | model | ins_company | ins_policy_num | maint_date | maint_desc | |
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2017-07-17 | oil change | |
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2018-01-03 | new tires | |
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2017-08-27 | new tires | |
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2018-01-08 | oil change | |
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-05-11 | tires rotated | |
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-11-03 | oil change | |
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2017-10-07 | new air filter | |
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2018-01-13 | new brakes |
Using a top-level document view of the data provides ready access to top-level elements. The Sync App returns nested elements in aggregate, as single columns.
One aspect to consider is performance. You forgo the time and resources to process and parse nested elements -- the Sync App parses the returned data once, using streaming to read the data. Another consideration is your need to access any data stored in nested parent elements, and the ability of your tool or application to process the data.
With DataModel set to "Document" (the default), the Sync App scans only a single object array, the top-level object array by default. The top-level object elements are available as columns due to the default object flattening. Nested object arrays are returned as aggregated strings.
Below is a sample query and the results, based on the sample document in Raw Data. The query results in a single "people" table.
Set the DataModel connection property to "Document" to perform the following query and see the example result set.
URI=C:\people.avro;DataModel=Document;
The following query pulls the top-level object elements and the vehicles array into the results.
SELECT
[personal.age] AS age,
[personal.gender] AS gender,
[personal.name.first] AS name_first,
[personal.name.last] AS name_last,
[source],
[vehicles]
FROM
[people]
With a document view of the data, the personal object is flattened into 4 columns and the source and vehicles elements are returned as individual columns, resulting in a table with 6 columns.
age | gender | name_first | name_last | source | vehicles | |
20 | M | John | Doe | internet | [{"type":"car","model":"Honda Civic","insurance":{"company":"ABC Insurance","policy_num":"12345"},"maintenance":[{"date":"07-17-2017","desc":"oil change"},{"date":"01-03-2018","desc":"new tires"}]},{"type":"truck","model":"Dodge Ram","insurance":{"company":"ABC Insurance","policy_num":"12345"},"maintenance":[{"date":"08-27-2017","desc":"new tires"},{"date":"01-08-2018","desc":"oil change"}]}]
| |
24 | F | Jane | Roberts | phone | [{"type":"car","model":"Toyota Camry","insurance":{"company":"Car Insurance","policy_num":"98765"},"maintenance":[{"date":"05-11-2017","desc":"tires rotated"},{"date":"11-03-2017","desc":"oil change"}]},{"type":"car","model":"Honda Accord","insurance":{"company":"Car Insurance","policy_num":"98765"},"maintenance":[{"date":"10-07-2017","desc":"new air filter"},{"date":"01-13-2018","desc":"new brakes"}]}]
|
The CData Sync App can be configured to create a relational model of the data, treating nested object arrays as individual tables containing a primary key and a foreign key that links to the parent document. This is particularly useful if you need to work with your data in existing BI, reporting, and ETL tools that expect a relational data model.
With DataModel set to "Relational", any JOINs are controlled by the query. Any time you perform a JOIN query, the file or source will be queried once for each table (nested array) included in the query.
Below is a sample query against the sample document in Raw Data, using a relational model.
URI=C:\people.avro;DataModel=Relational;'
The following query explicitly JOINs the people, vehicles, and maintenance tables.
SELECT
[people].[personal.age] AS age,
[people].[personal.gender] AS gender,
[people].[personal.name.first] AS first_name,
[people].[personal.name.last] AS last_name,
[people].[source],
[vehicles].[type],
[vehicles].[model],
[vehicles].[insurance.company] AS ins_company,
[vehicles].[insurance.policy_num] AS ins_policy_num,
[maintenance].[date] AS maint_date,
[maintenance].[desc] AS maint_desc
FROM
[people]
JOIN
[vehicles]
ON
[people].[_id] = [vehicles].[people_id]
JOIN
[maintenance]
ON
[vehicles].[_id] = [maintenance].[vehicles_id]
In the example query, each maintenance object is JOINed to its parent vehicle object, which is JOINed to its parent people object to produce a table with 8 rows (2 maintenance entries for each of 2 vehicles each for 2 people).
age | gender | first_name | last_name | source | type | model | ins_company | ins_policy_num | maint_date | maint_desc | ||
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2017-07-17 | oil change | ||
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2018-01-03 | new tires | ||
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2017-08-27 | new tires | ||
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2018-01-08 | oil change | ||
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-05-11 | tires rotated | ||
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-11-03 | oil change | ||
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2017-10-07 | new air filter | ||
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2018-01-13 | new brakes |
This section details a selection of advanced features of the Avro Sync App.
The Sync App allows you to define virtual tables, called user defined views, whose contents are decided by a pre-configured query. These views are useful when you cannot directly control queries being issued to the drivers. See User Defined Views for an overview of creating and configuring custom views.
Use SSL Configuration to adjust how Sync App handles TLS/SSL certificate negotiations. You can choose from various certificate formats; see the SSLServerCert property under "Connection String Options" for more information.
Configure the Sync App for compliance with Firewall and Proxy, including Windows proxies and HTTP proxies. You can also set up tunnel connections.
The Sync App offloads as much of the SELECT statement processing as possible to Avro and then processes the rest of the query in memory (client-side).
See Query Processing for more information.
See Logging for an overview of configuration settings that can be used to refine CData logging. For basic logging, you only need to set two connection properties, but there are numerous features that support more refined logging, where you can select subsets of information to be logged using the LogModules connection property.
By default, the Sync App attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store.
To specify another certificate, see the SSLServerCert property for the available formats to do so.
The Avro Sync App also supports setting client certificates. Set the following to connect using a client certificate.
To connect through the Windows system proxy, you do not need to set any additional connection properties. To connect to other proxies, set ProxyAutoDetect to false.
In addition, to authenticate to an HTTP proxy, set ProxyAuthScheme, ProxyUser, and ProxyPassword, in addition to ProxyServer and ProxyPort.
Set the following properties:
This section shows the available API objects and provides more information on executing SQL to Avro APIs.
The connection string properties are the various options that can be used to establish a connection. This section provides a complete list of the options you can configure in the connection string for this provider. Click the links for further details.
For more information on establishing a connection, see Establishing a Connection.
Property | Description |
AuthScheme | The type of authentication to use when connecting to remote services. |
AccessKey | Your account access key. This value is accessible from your security credentials page. |
SecretKey | Your account secret key. This value is accessible from your security credentials page. |
ApiKey | The API Key used to identify the user to IBM Cloud. |
User | The user account used to authenticate. |
Password | The password used to authenticate the user. |
SharePointEdition | The edition of SharePoint being used. Set either SharePointOnline or SharePointOnPremise. |
Property | Description |
ConnectionType | The type of connection to use. |
URI | The Uniform Resource Identifier (URI) for the Avro resource location. |
DataModel | Specifies the data model to use when parsing Avro documents and generating the database metadata. |
Region | The hosting region for your S3-like Web Services. |
ProjectId | The Id of the project where your Google Cloud Storage instance resides. |
OracleNamespace | The Oracle Cloud Object Storage namespace to use. |
StorageBaseURL | The URL of a cloud storage service provider. |
UseVirtualHosting | If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified. |
Property | Description |
AWSAccessKey | Your AWS account access key. This value is accessible from your AWS security credentials page. |
AWSSecretKey | Your AWS account secret key. This value is accessible from your AWS security credentials page. |
AWSRoleARN | The Amazon Resource Name of the role to use when authenticating. |
AWSPrincipalARN | The ARN of the SAML Identity provider in your AWS account. |
AWSRegion | The hosting region for your Amazon Web Services. |
AWSCredentialsFile | The path to the AWS Credentials File to be used for authentication. |
AWSCredentialsFileProfile | The name of the profile to be used from the supplied AWSCredentialsFile. |
AWSSessionToken | Your AWS session token. |
MFASerialNumber | The serial number of the MFA device if one is being used. |
MFAToken | The temporary token available from your MFA device. |
ServerSideEncryption | When activated, file uploads into Amazon S3 buckets will be server-side encrypted. |
Property | Description |
AzureStorageAccount | The name of your Azure storage account. |
AzureAccessKey | The storage key associated with your Avro account. |
AzureSharedAccessSignature | A shared access key signature that may be used for authentication. |
AzureTenant | The Microsoft Online tenant being used to access data. If not specified, your default tentant will be used. |
AzureEnvironment | The Azure Environment to use when establishing a connection. |
Property | Description |
SSOLoginURL | The identity provider's login URL. |
SSOProperties | Additional properties required to connect to the identity provider in a semicolon-separated list. |
Property | Description |
OAuthVersion | The version of OAuth being used. |
OAuthClientId | The client Id assigned when you register your application with an OAuth authorization server. |
OAuthClientSecret | The client secret assigned when you register your application with an OAuth authorization server. |
Scope | Specify scope to obtain the initial access and refresh token. |
OAuthGrantType | The grant type for the OAuth flow. |
OAuthPasswordGrantMode | How to pass Client Id and Secret with OAuthGrantType is set to Password. |
OAuthIncludeCallbackURL | Whether to include the callback URL in an access token request. |
OAuthAuthorizationURL | The authorization URL for the OAuth service. |
OAuthAccessTokenURL | The URL to retrieve the OAuth access token from. |
OAuthRefreshTokenURL | The URL to refresh the OAuth token from. |
OAuthRequestTokenURL | The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0. |
AuthToken | The authentication token used to request and obtain the OAuth Access Token. |
AuthKey | The authentication secret used to request and obtain the OAuth Access Token. |
OAuthParams | A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value. |
Property | Description |
OAuthJWTCert | The JWT Certificate store. |
OAuthJWTCertType | The type of key store containing the JWT Certificate. |
OAuthJWTCertPassword | The password for the OAuth JWT certificate. |
OAuthJWTCertSubject | The subject of the OAuth JWT certificate. |
OAuthJWTIssuer | The issuer of the Java Web Token. |
OAuthJWTSubject | The user subject for which the application is requesting delegated access. |
Property | Description |
KerberosKDC | The Kerberos Key Distribution Center (KDC) service used to authenticate the user. |
KerberosRealm | The Kerberos Realm used to authenticate the user. |
KerberosSPN | The service principal name (SPN) for the Kerberos Domain Controller. |
KerberosKeytabFile | The Keytab file containing your pairs of Kerberos principals and encrypted keys. |
KerberosServiceRealm | The Kerberos realm of the service. |
KerberosServiceKDC | The Kerberos KDC of the service. |
KerberosTicketCache | The full file path to an MIT Kerberos credential cache file. |
Property | Description |
SSLClientCert | The TLS/SSL client certificate store for SSL Client Authentication (2-way SSL). |
SSLClientCertType | The type of key store containing the TLS/SSL client certificate. |
SSLClientCertPassword | The password for the TLS/SSL client certificate. |
SSLClientCertSubject | The subject of the TLS/SSL client certificate. |
SSLMode | The authentication mechanism to be used when connecting to the FTP or FTPS server. |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
Property | Description |
SSHAuthMode | The authentication method to be used to log on to an SFTP server. |
SSHClientCert | A private key to be used for authenticating the user. |
SSHClientCertPassword | The password of the SSHClientCert key if it has one. |
SSHClientCertSubject | The subject of the SSH client certificate. |
SSHClientCertType | The type of SSHClientCert private key. |
SSHUser | The SSH user. |
SSHPassword | The SSH password. |
Property | Description |
FirewallType | The protocol used by a proxy-based firewall. |
FirewallServer | The name or IP address of a proxy-based firewall. |
FirewallPort | The TCP port for a proxy-based firewall. |
FirewallUser | The user name to use to authenticate with a proxy-based firewall. |
FirewallPassword | A password used to authenticate to a proxy-based firewall. |
Property | Description |
ProxyAutoDetect | This indicates whether to use the system proxy settings or not. This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings. |
ProxyServer | The hostname or IP address of a proxy to route HTTP traffic through. |
ProxyPort | The TCP port the ProxyServer proxy is running on. |
ProxyAuthScheme | The authentication type to use to authenticate to the ProxyServer proxy. |
ProxyUser | A user name to be used to authenticate to the ProxyServer proxy. |
ProxyPassword | A password to be used to authenticate to the ProxyServer proxy. |
ProxySSLType | The SSL type to use when connecting to the ProxyServer proxy. |
ProxyExceptions | A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer . |
Property | Description |
LogModules | Core modules to be included in the log file. |
Property | Description |
Location | A path to the directory that contains the schema files defining tables, views, and stored procedures. |
BrowsableSchemas | This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC. |
Tables | This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC. |
Views | Restricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC. |
FlattenObjects | Set FlattenObjects to true to flatten object properties into columns of their own. Otherwise, objects nested in arrays are returned as strings of JSON. |
FlattenArrays | By default, nested arrays are returned as strings. The FlattenArrays property can be used to flatten the elements of nested arrays into columns of their own. Set FlattenArrays to the number of elements you want to return from nested arrays. |
Property | Description |
AggregateFiles | When set to true, the provider will aggregate all the files in URI directory into a single result. |
Charset | Specifies the session character set for encoding and decoding character data transferred to and from the Avro file. The default value is UTF-8. |
DeleteDownloadedFiles | When set to true, the provider will delete parsed Avro files downloaded from cloud sources. |
DirectoryRetrievalDepth | Limit the subfolders recursively scanned when IncludeSubdirectories is enabled. |
ExcludeFiles | Comma-separated list of file extensions to exclude from the set of the files modeled as tables. |
IncludeDropboxTeamResources | Indicates if you want to include Dropbox team files and folders. |
IncludeFiles | Comma-separated list of file extensions to include into the set of the files modeled as tables. |
IncludeSubdirectories | Whether to read files from nested folders. In the case of a name collision, table names are prefixed by the underscore-separated folder names. |
InsertMode | The behavior when using bulk inserts to create Avro files. |
MaxRows | Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time. |
MetadataDiscoveryURI | Used when aggregating multiple files into one table, this property specifies a specific file to read to determined the aggregated table schema. |
Other | These hidden properties are used only in specific use cases. |
PageSize | (Optional) PageSize value. |
PathSeparator | Determines the character which will be used to replace the file separator. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
TemporaryLocalFolder | The path, or URI, to the folder that is used to temporarily download avro file(s). |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
UserDefinedViews | A filepath pointing to the JSON configuration file containing your custom views. |
This section provides a complete list of the Authentication properties you can configure in the connection string for this provider.
Property | Description |
AuthScheme | The type of authentication to use when connecting to remote services. |
AccessKey | Your account access key. This value is accessible from your security credentials page. |
SecretKey | Your account secret key. This value is accessible from your security credentials page. |
ApiKey | The API Key used to identify the user to IBM Cloud. |
User | The user account used to authenticate. |
Password | The password used to authenticate the user. |
SharePointEdition | The edition of SharePoint being used. Set either SharePointOnline or SharePointOnPremise. |
The type of authentication to use when connecting to remote services.
The following options are generally available to all connections:
The following options are available when URI refers to a web service:
The following options are also available when URI points to an Amazon service:
The following options are also available when URI points to an Azure service:
The following options are also available when URI points to a SharePoint SOAP service:
The following options are also available when URI points to a IBM Cloud Object Storage service:
Your account access key. This value is accessible from your security credentials page.
Your account access key. This value is accessible from your security credentials page depending on the service you are using.
Your account secret key. This value is accessible from your security credentials page.
Your account secret key. This value is accessible from your security credentials page depending on the service you are using.
The API Key used to identify the user to IBM Cloud.
Access to resources in the Avro REST API is governed by an API key in order to retrieve token. An API Key can be created by navigating to Manage --> Access (IAM) --> Users and clicking 'Create'.
The user account used to authenticate.
Together with Password, this field is used to authenticate against the server.
This property will refer to different things based on the context, namely the value of ConnectionType and AuthScheme:
The password used to authenticate the user.
The User and Password are together used to authenticate with the server.
This property will refer to different things based on the context, namely the value of ConnectionType and AuthScheme:
This section provides a complete list of the Connection properties you can configure in the connection string for this provider.
Property | Description |
ConnectionType | The type of connection to use. |
URI | The Uniform Resource Identifier (URI) for the Avro resource location. |
DataModel | Specifies the data model to use when parsing Avro documents and generating the database metadata. |
Region | The hosting region for your S3-like Web Services. |
ProjectId | The Id of the project where your Google Cloud Storage instance resides. |
OracleNamespace | The Oracle Cloud Object Storage namespace to use. |
StorageBaseURL | The URL of a cloud storage service provider. |
UseVirtualHosting | If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified. |
The type of connection to use.
Set the ConnectionType to one of the following:
The Uniform Resource Identifier (URI) for the Avro resource location.
Set the URI property to specify a path to a file or stream.
NOTE: this connection property requires that you set ConnectionType, which provide the
See for more advanced features available for parsing and merging multiple files.
Below are examples of the URI formats for the available data sources:
Service provider | URI formats | |
Local | Single File Path One table
localPath file://localPath Directory Path (one table per file) localPath file://localPath | |
HTTP or HTTPS | http://remoteStream
https://remoteStream | |
Amazon S3 | Single File Path One table
s3://remotePath Directory Path (one table per file) s3://remotePath | |
Azure Blob Storage | Single File Path One table
azureblob://mycontainer/myblob/ Directory Path (one table per file) azureblob://mycontainer/myblob/ | |
OneDrive | Single File Path One table
onedrive://remotePath Directory Path (one table per file) onedrive://remotePath | |
Google Cloud Storage | Single File Path One table
gs://bucket/remotePath Directory Path (one table per file) gs://bucket/remotePath | |
Google Drive | Single File Path One table
gdrive://remotePath Directory Path (one table per file) gdrive://remotePath | |
Box | Single File Path One table
box://remotePath Directory Path (one table per file) box://remotePath | |
FTP or FTPS | Single File Path One table
ftp://server:port/remotePath Directory Path (one table per file) ftp://server:port/remotePath | |
SFTP | Single File Path One table
sftp://server:port/remotePath Directory Path (one table per file) sftp://server:port/remotePath | |
Sharepoint | Single File Path One table
sp://https://server/remotePath Directory Path (one table per file) sp://https://server/remotePath |
Below are example connection strings to Avro files or streams.
Service provider | URI formats | Connection example |
Local | Single File Path One table
localPath file://localPath Directory Path (one table per file) localPath file://localPath | URI=C:\folder1 |
Amazon S3 | Single File Path One table
s3://bucket1/folder1 Directory Path (one table per file) s3://bucket1/folder1 | URI=s3://bucket1/folder1; AWSAccessKey=token1; AWSSecretKey=secret1; AWSRegion=OHIO; |
Azure Blob Storage | Single File Path One table
azureblob://mycontainer/myblob/ Directory Path (one table per file) azureblob://mycontainer/myblob/ | URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount; AzureAccessKey=myKey;
URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount; AuthScheme=OAuth; |
OneDrive | Single File Path One table
onedrive://remotePath Directory Path (one table per file) onedrive://remotePath | URI=onedrive://folder1; AuthScheme=OAuth;
URI=onedrive://SharedWithMe/folder1; AuthScheme=OAuth; |
Google Cloud Storage | Single File Path One table
gs://bucket/remotePath Directory Path (one table per file) gs://bucket/remotePath | URI=gs://bucket/folder1; AuthScheme=OAuth; ProjectId=test; |
Google Drive | Single File Path One table
gdrive://remotePath Directory Path (one table per file) gdrive://remotePath | URI=gdrive://folder1; |
Box | Single File Path One table
box://remotePath Directory Path (one table per file) box://remotePath | URI=box://folder1; OAuthClientId=oauthclientid1; OAuthClientSecret=oauthcliensecret1; CallbackUrl=http://localhost:12345; |
FTP or FTPS | Single File Path One table
ftp://server:port/remotePath Directory Path (one table per file) ftp://server:port/remotePath | URI=ftps://localhost:990/folder1; User=user1; Password=password1; |
SFTP | sftp://server:port/remotePath | URI=sftp://127.0.0.1:22/remotePath; User=user1; Password=password1; |
Sharepoint | sp://https://server/remotePath | URI=sp://https://domain.sharepoint.com/Documents; User=user1; Password=password1; |
Specifies the data model to use when parsing Avro documents and generating the database metadata.
The Sync App splits documents into rows based on the objects nested in arrays. Select a DataModel configuration to configure how the Sync App models nested object arrays into tables.
The following DataModel configurations are available.
Document
Returns a single table representing a row for each top-level object. In this data model, any nested object arrays will not be flattened and will be returned as aggregates.
FlattenedDocuments
Returns a single table representing a SQL CROSS JOIN of the available documents in the file.
Relational
Returns multiple tables, one for each nested object array. In this data model, any nested documents (object arrays) will be returned as relational tables that contain a primary key and a foreign key that links to the parent table.
The hosting region for your S3-like Web Services.
The hosting region for your S3-like Web Services.
Value | Region |
Commercial Cloud Regions | |
ap-hyderabad-1 | India South (Hyderabad) |
ap-melbourne-1 | Australia Southeast (Melbourne) |
ap-mumbai-1 | India West (Mumbai) |
ap-osaka-1 | Japan Central (Osaka) |
ap-seoul-1 | South Korea Central (Seoul) |
ap-sydney-1 | Australia East (Sydney) |
ap-tokyo-1 | Japan East (Tokyo) |
ca-montreal-1 | Canada Southeast (Montreal) |
ca-toronto-1 | Canada Southeast (Toronto) |
eu-amsterdam-1 | Netherlands Northwest (Amsterdam) |
eu-frankfurt-1 | Germany Central (Frankfurt) |
eu-zurich-1 | Switzerland North (Zurich) |
me-jeddah-1 | Saudi Arabia West (Jeddah) |
sa-saopaulo-1 | Brazil East (Sao Paulo) |
uk-london-1 | UK South (London) |
us-ashburn-1 (default) | US East (Ashburn, VA) |
us-phoenix-1 | US West (Phoenix, AZ) |
US Gov FedRAMP High Regions | |
us-langley-1 | US Gov East (Ashburn, VA) |
us-luke-1 | US Gov West (Phoenix, AZ) |
US Gov DISA IL5 Regions | |
us-gov-ashburn-1 | US DoD East (Ashburn, VA) |
us-gov-chicago-1 | US DoD North (Chicago, IL) |
us-gov-phoenix-1 | US DoD West (Phoenix, AZ) |
Value | Region |
eu-central-1 | Europe (Amsterdam) |
us-east-1 (Default) | US East (Ashburn, VA) |
us-east-2 | US East (Manassas, VA) |
us-west-1 | US West (Hillsboro, OR) |
The Id of the project where your Google Cloud Storage instance resides.
The Id of the project where your Google Cloud Storage instance resides. You can find this value by going to Google Cloud Console and clicking the project name at the top left screen. The ProjectId is displayed on the Id column of the matching project.
The Oracle Cloud Object Storage namespace to use.
The Oracle Cloud Object Storage namespace to use. This setting must be set to the Oracle Cloud Object Storage namespace associated with the Oracle Cloud account before any requests can be made. Refer to the Understanding Object Storage Namespaces page of the Oracle Cloud documentation for instructions on how to find your account's Object Storage namespace.
The URL of a cloud storage service provider.
This connection property is used to specify:
If the domain for this option ends in -my (for example, https://bigcorp-my.sharepoint.com) then you may need to use the onedrive:// scheme instead of the sp:// or sprest:// scheme.
If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified.
If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified.
This section provides a complete list of the AWS Authentication properties you can configure in the connection string for this provider.
Property | Description |
AWSAccessKey | Your AWS account access key. This value is accessible from your AWS security credentials page. |
AWSSecretKey | Your AWS account secret key. This value is accessible from your AWS security credentials page. |
AWSRoleARN | The Amazon Resource Name of the role to use when authenticating. |
AWSPrincipalARN | The ARN of the SAML Identity provider in your AWS account. |
AWSRegion | The hosting region for your Amazon Web Services. |
AWSCredentialsFile | The path to the AWS Credentials File to be used for authentication. |
AWSCredentialsFileProfile | The name of the profile to be used from the supplied AWSCredentialsFile. |
AWSSessionToken | Your AWS session token. |
MFASerialNumber | The serial number of the MFA device if one is being used. |
MFAToken | The temporary token available from your MFA device. |
ServerSideEncryption | When activated, file uploads into Amazon S3 buckets will be server-side encrypted. |
Your AWS account access key. This value is accessible from your AWS security credentials page.
Your AWS account access key. This value is accessible from your AWS security credentials page:
Your AWS account secret key. This value is accessible from your AWS security credentials page.
Your AWS account secret key. This value is accessible from your AWS security credentials page:
The Amazon Resource Name of the role to use when authenticating.
When authenticating outside of AWS, it is common to use a Role for authentication instead of your direct AWS account credentials. Entering the AWSRoleARN will cause the CData Sync App to perform a role based authentication instead of using the AWSAccessKey and AWSSecretKey directly. The AWSAccessKey and AWSSecretKey must still be specified to perform this authentication. You cannot use the credentials of an AWS root user when setting RoleARN. The AWSAccessKey and AWSSecretKey must be those of an IAM user.
The ARN of the SAML Identity provider in your AWS account.
The ARN of the SAML Identity provider in your AWS account.
The hosting region for your Amazon Web Services.
The hosting region for your Amazon Web Services. Available values are OHIO, NORTHERNVIRGINIA, NORTHERNCALIFORNIA, OREGON, CAPETOWN, HONGKONG, JAKARTA, MUMBAI, OSAKA, SEOUL, SINGAPORE, SYDNEY, TOKYO, CENTRAL, BEIJING, NINGXIA, FRANKFURT, IRELAND, LONDON, MILAN, PARIS, STOCKHOLM, ZURICH, BAHRAIN, UAE, SAOPAULO, GOVCLOUDEAST, and GOVCLOUDWEST.
The path to the AWS Credentials File to be used for authentication.
The path to the AWS Credentials File to be used for authentication. See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html for more information.
The name of the profile to be used from the supplied AWSCredentialsFile.
The name of the profile to be used from the supplied AWSCredentialsFile. See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html for more information.
Your AWS session token.
Your AWS session token. This value can be retrieved in different ways. See this link for more info.
The serial number of the MFA device if one is being used.
You can find the device for an IAM user by going to the AWS Management Console and viewing the user's security credentials. For virtual devices, this is actually an Amazon Resource Name (such as arn:aws:iam::123456789012:mfa/user).
The temporary token available from your MFA device.
If MFA is required, this value will be used along with the MFASerialNumber to retrieve temporary credentials to login. The temporary credentials available from AWS will only last up to 1 hour by default (see TemporaryTokenDuration). Once the time is up, the connection must be updated to specify a new MFA token so that new credentials may be obtained. %AWSpSecurityToken; %AWSpTemporaryTokenDuration;
When activated, file uploads into Amazon S3 buckets will be server-side encrypted.
Server-side encryption is the encryption of data at its destination by the application or service that receives it. Amazon S3 encrypts your data at the object level as it writes it to disks in its data centers and decrypts it for you when you access it. Learn more: https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html
This section provides a complete list of the Azure Authentication properties you can configure in the connection string for this provider.
Property | Description |
AzureStorageAccount | The name of your Azure storage account. |
AzureAccessKey | The storage key associated with your Avro account. |
AzureSharedAccessSignature | A shared access key signature that may be used for authentication. |
AzureTenant | The Microsoft Online tenant being used to access data. If not specified, your default tentant will be used. |
AzureEnvironment | The Azure Environment to use when establishing a connection. |
The name of your Azure storage account.
The name of your Azure storage account.
The storage key associated with your Avro account.
The storage key associated with your Avro account. You can retrieve it as follows:
The Microsoft Online tenant being used to access data. If not specified, your default tentant will be used.
The Microsoft Online tenant being used to access data. For instance, contoso.onmicrosoft.com. Alternatively, specify the tenant Id. This value is the directory Id in the Azure Portal > Azure Active Directory > Properties.
Typically it is not necessary to specify the Tenant. This can be automatically determined by Microsoft when using the OAuthGrantType set to CODE (default). However, it may fail in the case that the user belongs to multiple tenants. For instance, if an Admin of domain A invites a user of domain B to be a guest user. The user will now belong to both tenants. It is a good practice to specify the Tenant, although in general things should normally work without having to specify it.
The AzureTenant is required when setting OAuthGrantType to CLIENT. When using client credentials, there is no user context. The credentials are taken from the context of the app itself. While Microsoft still allows client credentials to be obtained without specifying which Tenant, it has a much lower probability of picking the specific tenant you want to work with. For this reason, we require AzureTenant to be explicitly stated for all client credentials connections to ensure you get credentials that are applicable for the domain you intend to connect to.
The Azure Environment to use when establishing a connection.
In most cases, leaving the environment set to global will work. However, if your Azure Account has been added to a different environment, the AzureEnvironment may be used to specify which environment. The available values are GLOBAL, CHINA, USGOVT, USGOVTDOD.
This section provides a complete list of the SSO properties you can configure in the connection string for this provider.
Property | Description |
SSOLoginURL | The identity provider's login URL. |
SSOProperties | Additional properties required to connect to the identity provider in a semicolon-separated list. |
The identity provider's login URL.
The identity provider's login URL.
Additional properties required to connect to the identity provider in a semicolon-separated list.
Additional properties required to connect to the identity provider in a semicolon-separated list. is used in conjunction with the SSOLoginURL.
SSO configuration is discussed further in .
This section provides a complete list of the OAuth properties you can configure in the connection string for this provider.
Property | Description |
OAuthVersion | The version of OAuth being used. |
OAuthClientId | The client Id assigned when you register your application with an OAuth authorization server. |
OAuthClientSecret | The client secret assigned when you register your application with an OAuth authorization server. |
Scope | Specify scope to obtain the initial access and refresh token. |
OAuthGrantType | The grant type for the OAuth flow. |
OAuthPasswordGrantMode | How to pass Client Id and Secret with OAuthGrantType is set to Password. |
OAuthIncludeCallbackURL | Whether to include the callback URL in an access token request. |
OAuthAuthorizationURL | The authorization URL for the OAuth service. |
OAuthAccessTokenURL | The URL to retrieve the OAuth access token from. |
OAuthRefreshTokenURL | The URL to refresh the OAuth token from. |
OAuthRequestTokenURL | The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0. |
AuthToken | The authentication token used to request and obtain the OAuth Access Token. |
AuthKey | The authentication secret used to request and obtain the OAuth Access Token. |
OAuthParams | A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value. |
The version of OAuth being used.
The version of OAuth being used. The following options are available: 1.0,2.0
The client Id assigned when you register your application with an OAuth authorization server.
As part of registering an OAuth application, you will receive the OAuthClientId value, sometimes also called a consumer key, and a client secret, the OAuthClientSecret.
The client secret assigned when you register your application with an OAuth authorization server.
As part of registering an OAuth application, you will receive the OAuthClientId, also called a consumer key. You will also receive a client secret, also called a consumer secret. Set the client secret in the OAuthClientSecret property.
Specify scope to obtain the initial access and refresh token.
Specify scope to obtain the initial access and refresh token.
The grant type for the OAuth flow.
The following options are available: CODE,CLIENT,PASSWORD
How to pass Client Id and Secret with OAuthGrantType is set to Password.
The OAuth RFC specifies two methods of passing the OAuthClientId and OAuthClientSecret when using the Password OAuthGrantType. The most commonly used is to pass them via post data to the service. However, some services may require that you pass them via the Authorize header as to be used in BASIC authorization. Change this property to Basic to submit the parameters as part of the Authorize header instead of the post data.
Whether to include the callback URL in an access token request.
This defaults to true since standards-compliant OAuth services will ignore the redirect_uri parameter for grant types like CLIENT or PASSWORD that do not require it.
This option should only be enabled for OAuth services that report errors when redirect_uri is included.
The authorization URL for the OAuth service.
The authorization URL for the OAuth service. At this URL, the user logs into the server and grants permissions to the application. In OAuth 1.0, if permissions are granted, the request token is authorized.
The URL to retrieve the OAuth access token from.
The URL to retrieve the OAuth access token from. In OAuth 1.0, the authorized request token is exchanged for the access token at this URL.
The URL to refresh the OAuth token from.
The URL to refresh the OAuth token from. In OAuth 2.0, this URL is where the refresh token is exchanged for a new access token when the old access token expires.
The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0.
The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0. In OAuth 1.0, this is the URL where the app makes a request for the request token.
The authentication token used to request and obtain the OAuth Access Token.
This property is required only when performing headless authentication in OAuth 1.0. It can be obtained from the GetOAuthAuthorizationUrl stored procedure.
It can be supplied alongside the AuthKey in the GetOAuthAccessToken stored procedure to obtain the OAuthAccessToken.
The authentication secret used to request and obtain the OAuth Access Token.
This property is required only when performing headless authentication in OAuth 1.0. It can be obtained from the GetOAuthAuthorizationUrl stored procedure.
It can be supplied alongside the AuthToken in the GetOAuthAccessToken stored procedure to obtain the OAuthAccessToken.
A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value.
A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value.
This section provides a complete list of the JWT OAuth properties you can configure in the connection string for this provider.
Property | Description |
OAuthJWTCert | The JWT Certificate store. |
OAuthJWTCertType | The type of key store containing the JWT Certificate. |
OAuthJWTCertPassword | The password for the OAuth JWT certificate. |
OAuthJWTCertSubject | The subject of the OAuth JWT certificate. |
OAuthJWTIssuer | The issuer of the Java Web Token. |
OAuthJWTSubject | The user subject for which the application is requesting delegated access. |
The JWT Certificate store.
The name of the certificate store for the client certificate.
The OAuthJWTCertType field specifies the type of the certificate store specified by OAuthJWTCert. If the store is password protected, specify the password in OAuthJWTCertPassword.
OAuthJWTCert is used in conjunction with the OAuthJWTCertSubject field in order to specify client certificates. If OAuthJWTCert has a value, and OAuthJWTCertSubject is set, a search for a certificate is initiated. Please refer to the OAuthJWTCertSubject field for details.
Designations of certificate stores are platform-dependent.
The following are designations of the most common User and Machine certificate stores in Windows:
MY | A certificate store holding personal certificates with their associated private keys. |
CA | Certifying authority certificates. |
ROOT | Root certificates. |
SPC | Software publisher certificates. |
In Java, the certificate store normally is a file containing certificates and optional private keys.
When the certificate store type is PFXFile, this property must be set to the name of the file. When the type is PFXBlob, the property must be set to the binary contents of a PFX file (i.e. PKCS12 certificate store).
The type of key store containing the JWT Certificate.
This property can take one of the following values:
USER | For Windows, this specifies that the certificate store is a certificate store owned by the current user. Note: This store type is not available in Java. |
MACHINE | For Windows, this specifies that the certificate store is a machine store. Note: this store type is not available in Java. |
PFXFILE | The certificate store is the name of a PFX (PKCS12) file containing certificates. |
PFXBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in PFX (PKCS12) format. |
JKSFILE | The certificate store is the name of a Java key store (JKS) file containing certificates. Note: this store type is only available in Java. |
JKSBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in Java key store (JKS) format. Note: this store type is only available in Java. |
PEMKEY_FILE | The certificate store is the name of a PEM-encoded file that contains a private key and an optional certificate. |
PEMKEY_BLOB | The certificate store is a string (base64-encoded) that contains a private key and an optional certificate. |
PUBLIC_KEY_FILE | The certificate store is the name of a file that contains a PEM- or DER-encoded public key certificate. |
PUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains a PEM- or DER-encoded public key certificate. |
SSHPUBLIC_KEY_FILE | The certificate store is the name of a file that contains an SSH-style public key. |
SSHPUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains an SSH-style public key. |
P7BFILE | The certificate store is the name of a PKCS7 file containing certificates. |
PPKFILE | The certificate store is the name of a file that contains a PPK (PuTTY Private Key). |
XMLFILE | The certificate store is the name of a file that contains a certificate in XML format. |
XMLBLOB | The certificate store is a string that contains a certificate in XML format. |
GOOGLEJSON | The certificate store is the name of a JSON file containing the service account information. Only valid when connecting to a Google service. |
GOOGLEJSONBLOB | The certificate store is a string that contains the service account JSON. Only valid when connecting to a Google service. |
The password for the OAuth JWT certificate.
If the certificate store is of a type that requires a password, this property is used to specify that password in order to open the certificate store.
This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys are not encrypted.
The subject of the OAuth JWT certificate.
When loading a certificate the subject is used to locate the certificate in the store.
If an exact match is not found, the store is searched for subjects containing the value of the property.
If a match is still not found, the property is set to an empty string, and no certificate is selected.
The special value "*" picks the first certificate in the certificate store.
The certificate subject is a comma separated list of distinguished name fields and values. For instance "CN=www.server.com, OU=test, C=US, [email protected]". Common fields and their meanings are displayed below.
Field | Meaning |
CN | Common Name. This is commonly a host name like www.server.com. |
O | Organization |
OU | Organizational Unit |
L | Locality |
S | State |
C | Country |
E | Email Address |
If a field value contains a comma it must be quoted.
The issuer of the Java Web Token.
The issuer of the Java Web Token. This is typically either the Client Id or Email Address of the OAuth Application.
This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys contain a copy of the issuer account.
The user subject for which the application is requesting delegated access.
The user subject for which the application is requesting delegated access. Typically, the user account name or email address.
This section provides a complete list of the Kerberos properties you can configure in the connection string for this provider.
Property | Description |
KerberosKDC | The Kerberos Key Distribution Center (KDC) service used to authenticate the user. |
KerberosRealm | The Kerberos Realm used to authenticate the user. |
KerberosSPN | The service principal name (SPN) for the Kerberos Domain Controller. |
KerberosKeytabFile | The Keytab file containing your pairs of Kerberos principals and encrypted keys. |
KerberosServiceRealm | The Kerberos realm of the service. |
KerberosServiceKDC | The Kerberos KDC of the service. |
KerberosTicketCache | The full file path to an MIT Kerberos credential cache file. |
The Kerberos Key Distribution Center (KDC) service used to authenticate the user.
The Kerberos properties are used when using SPNEGO or Windows Authentication. The Sync App will request session tickets and temporary session keys from the Kerberos KDC service. The Kerberos KDC service is conventionally colocated with the domain controller.
If Kerberos KDC is not specified, the Sync App will attempt to detect these properties automatically from the following locations:
The Kerberos Realm used to authenticate the user.
The Kerberos properties are used when using SPNEGO or Windows Authentication. The Kerberos Realm is used to authenticate the user with the Kerberos Key Distribution Service (KDC). The Kerberos Realm can be configured by an administrator to be any string, but conventionally it is based on the domain name.
If Kerberos Realm is not specified, the Sync App will attempt to detect these properties automatically from the following locations:
The service principal name (SPN) for the Kerberos Domain Controller.
If the SPN on the Kerberos Domain Controller is not the same as the URL that you are authenticating to, use this property to set the SPN.
The Keytab file containing your pairs of Kerberos principals and encrypted keys.
The Keytab file containing your pairs of Kerberos principals and encrypted keys.
The Kerberos realm of the service.
The KerberosServiceRealm is the specify the service Kerberos realm when using cross-realm Kerberos authentication.
In most cases, a single realm and KDC machine are used to perform the Kerberos authentication and this property is not required.
This property is available for complex setups where a different realm and KDC machine are used to obtain an authentication ticket (AS request) and a service ticket (TGS request).
The Kerberos KDC of the service.
The KerberosServiceKDC is used to specify the service Kerberos KDC when using cross-realm Kerberos authentication.
In most cases, a single realm and KDC machine are used to perform the Kerberos authentication and this property is not required.
This property is available for complex setups where a different realm and KDC machine are used to obtain an authentication ticket (AS request) and a service ticket (TGS request).
The full file path to an MIT Kerberos credential cache file.
This property can be set if you wish to use a credential cache file that was created using the MIT Kerberos Ticket Manager or kinit command.
This section provides a complete list of the SSL properties you can configure in the connection string for this provider.
Property | Description |
SSLClientCert | The TLS/SSL client certificate store for SSL Client Authentication (2-way SSL). |
SSLClientCertType | The type of key store containing the TLS/SSL client certificate. |
SSLClientCertPassword | The password for the TLS/SSL client certificate. |
SSLClientCertSubject | The subject of the TLS/SSL client certificate. |
SSLMode | The authentication mechanism to be used when connecting to the FTP or FTPS server. |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
The TLS/SSL client certificate store for SSL Client Authentication (2-way SSL).
The name of the certificate store for the client certificate.
The SSLClientCertType field specifies the type of the certificate store specified by SSLClientCert. If the store is password protected, specify the password in SSLClientCertPassword.
SSLClientCert is used in conjunction with the SSLClientCertSubject field in order to specify client certificates. If SSLClientCert has a value, and SSLClientCertSubject is set, a search for a certificate is initiated. See SSLClientCertSubject for more information.
Designations of certificate stores are platform-dependent.
The following are designations of the most common User and Machine certificate stores in Windows:
MY | A certificate store holding personal certificates with their associated private keys. |
CA | Certifying authority certificates. |
ROOT | Root certificates. |
SPC | Software publisher certificates. |
In Java, the certificate store normally is a file containing certificates and optional private keys.
When the certificate store type is PFXFile, this property must be set to the name of the file. When the type is PFXBlob, the property must be set to the binary contents of a PFX file (for example, PKCS12 certificate store).
The type of key store containing the TLS/SSL client certificate.
This property can take one of the following values:
USER - default | For Windows, this specifies that the certificate store is a certificate store owned by the current user. Note that this store type is not available in Java. |
MACHINE | For Windows, this specifies that the certificate store is a machine store. Note that this store type is not available in Java. |
PFXFILE | The certificate store is the name of a PFX (PKCS12) file containing certificates. |
PFXBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in PFX (PKCS12) format. |
JKSFILE | The certificate store is the name of a Java key store (JKS) file containing certificates. Note that this store type is only available in Java. |
JKSBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in JKS format. Note that this store type is only available in Java. |
PEMKEY_FILE | The certificate store is the name of a PEM-encoded file that contains a private key and an optional certificate. |
PEMKEY_BLOB | The certificate store is a string (base64-encoded) that contains a private key and an optional certificate. |
PUBLIC_KEY_FILE | The certificate store is the name of a file that contains a PEM- or DER-encoded public key certificate. |
PUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains a PEM- or DER-encoded public key certificate. |
SSHPUBLIC_KEY_FILE | The certificate store is the name of a file that contains an SSH-style public key. |
SSHPUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains an SSH-style public key. |
P7BFILE | The certificate store is the name of a PKCS7 file containing certificates. |
PPKFILE | The certificate store is the name of a file that contains a PuTTY Private Key (PPK). |
XMLFILE | The certificate store is the name of a file that contains a certificate in XML format. |
XMLBLOB | The certificate store is a string that contains a certificate in XML format. |
The password for the TLS/SSL client certificate.
If the certificate store is of a type that requires a password, this property is used to specify that password to open the certificate store.
The subject of the TLS/SSL client certificate.
When loading a certificate the subject is used to locate the certificate in the store.
If an exact match is not found, the store is searched for subjects containing the value of the property. If a match is still not found, the property is set to an empty string, and no certificate is selected.
The special value "*" picks the first certificate in the certificate store.
The certificate subject is a comma separated list of distinguished name fields and values. For example, "CN=www.server.com, OU=test, C=US, [email protected]". The common fields and their meanings are shown below.
Field | Meaning |
CN | Common Name. This is commonly a host name like www.server.com. |
O | Organization |
OU | Organizational Unit |
L | Locality |
S | State |
C | Country |
E | Email Address |
If a field value contains a comma, it must be quoted.
The authentication mechanism to be used when connecting to the FTP or FTPS server.
If SSLMode is set to NONE, default plaintext authentication is used to log in to the server. If SSLMode is set to IMPLICIT, the SSL negotiation will start immediately after the connection is established. If SSLMode is set to EXPLICIT, the Sync App will first connect in plaintext, and then explicitly start SSL negotiation through a protocol command such as STARTTLS. If SSLMode is set to AUTOMATIC, if the remote port is set to the standard plaintext port of the protocol (where applicable), the component will behave the same as if SSLMode is set to EXPLICIT. In all other cases, SSL negotiation will be IMPLICIT.
The certificate to be accepted from the server when connecting using TLS/SSL.
If using a TLS/SSL connection, this property can be used to specify the TLS/SSL certificate to be accepted from the server. Any other certificate that is not trusted by the machine is rejected.
This property can take the following forms:
Description | Example |
A full PEM Certificate (example shortened for brevity) | -----BEGIN CERTIFICATE----- MIIChTCCAe4CAQAwDQYJKoZIhv......Qw== -----END CERTIFICATE----- |
A path to a local file containing the certificate | C:\cert.cer |
The public key (example shortened for brevity) | -----BEGIN RSA PUBLIC KEY----- MIGfMA0GCSq......AQAB -----END RSA PUBLIC KEY----- |
The MD5 Thumbprint (hex values can also be either space or colon separated) | ecadbdda5a1529c58a1e9e09828d70e4 |
The SHA1 Thumbprint (hex values can also be either space or colon separated) | 34a929226ae0819f2ec14b4a3d904f801cbb150d |
If not specified, any certificate trusted by the machine is accepted.
Use '*' to signify to accept all certificates. Note that this is not recommended due to security concerns.
This section provides a complete list of the SSH properties you can configure in the connection string for this provider.
Property | Description |
SSHAuthMode | The authentication method to be used to log on to an SFTP server. |
SSHClientCert | A private key to be used for authenticating the user. |
SSHClientCertPassword | The password of the SSHClientCert key if it has one. |
SSHClientCertSubject | The subject of the SSH client certificate. |
SSHClientCertType | The type of SSHClientCert private key. |
SSHUser | The SSH user. |
SSHPassword | The SSH password. |
The authentication method to be used to log on to an SFTP server.
A private key to be used for authenticating the user.
SSHClientCert must contain a valid private key in order to use public key authentication. A public key is optional, if one is not included then the Sync App generates it from the private key. The Sync App sends the public key to the server and the connection is allowed if the user has authorized the public key.
The SSHClientCertType field specifies the type of the key store specified by SSHClientCert. If the store is password protected, specify the password in SSHClientCertPassword.
Some types of key stores are containers which may include multiple keys. By default the Sync App will select the first key in the store, but you can specify a specific key using SSHClientCertSubject.
The password of the SSHClientCert key if it has one.
This property is only used when authenticating to SFTP servers with SSHAuthMode set to PublicKey and SSHClientCert set to a private key.
The subject of the SSH client certificate.
When loading a certificate the subject is used to locate the certificate in the store.
If an exact match is not found, the store is searched for subjects containing the value of the property.
If a match is still not found, the property is set to an empty string, and no certificate is selected.
The special value "*" picks the first certificate in the certificate store.
The certificate subject is a comma separated list of distinguished name fields and values. For instance "CN=www.server.com, OU=test, C=US, [email protected]". Common fields and their meanings are displayed below.
Field | Meaning |
CN | Common Name. This is commonly a host name like www.server.com. |
O | Organization |
OU | Organizational Unit |
L | Locality |
S | State |
C | Country |
E | Email Address |
If a field value contains a comma it must be quoted.
The type of SSHClientCert private key.
This property can take one of the following values:
Types | Description | Allowed Blob Values |
MACHINE/USER | Blob values are not supported. | |
JKSFILE/JKSBLOB | base64-only | |
PFXFILE/PFXBLOB | A PKCS12-format (.pfx) file. Must contain both a certificate and a private key. | base64-only |
PEMKEY_FILE/PEMKEY_BLOB | A PEM-format file. Must contain an RSA, DSA, or OPENSSH private key. Can optionally contain a certificate matching the private key. | base64 or plain text. Newlines may be replaced with spaces when providing the blob as text. |
PPKFILE/PPKBLOB | A PuTTY-format private key created using the puttygen tool. | base64-only |
XMLFILE/XMLBLOB | An XML key in the format generated by the .NET RSA class: RSA.ToXmlString(true). | base64 or plain text. |
The SSH user.
The SSH user.
The SSH password.
The SSH password.
This section provides a complete list of the Firewall properties you can configure in the connection string for this provider.
Property | Description |
FirewallType | The protocol used by a proxy-based firewall. |
FirewallServer | The name or IP address of a proxy-based firewall. |
FirewallPort | The TCP port for a proxy-based firewall. |
FirewallUser | The user name to use to authenticate with a proxy-based firewall. |
FirewallPassword | A password used to authenticate to a proxy-based firewall. |
The protocol used by a proxy-based firewall.
This property specifies the protocol that the Sync App will use to tunnel traffic through the FirewallServer proxy. Note that by default, the Sync App connects to the system proxy; to disable this behavior and connect to one of the following proxy types, set ProxyAutoDetect to false.
Type | Default Port | Description |
TUNNEL | 80 | When this is set, the Sync App opens a connection to Avro and traffic flows back and forth through the proxy. |
SOCKS4 | 1080 | When this is set, the Sync App sends data through the SOCKS 4 proxy specified by FirewallServer and FirewallPort and passes the FirewallUser value to the proxy, which determines if the connection request should be granted. |
SOCKS5 | 1080 | When this is set, the Sync App sends data through the SOCKS 5 proxy specified by FirewallServer and FirewallPort. If your proxy requires authentication, set FirewallUser and FirewallPassword to credentials the proxy recognizes. |
To connect to HTTP proxies, use ProxyServer and ProxyPort. To authenticate to HTTP proxies, use ProxyAuthScheme, ProxyUser, and ProxyPassword.
The name or IP address of a proxy-based firewall.
This property specifies the IP address, DNS name, or host name of a proxy allowing traversal of a firewall. The protocol is specified by FirewallType: Use FirewallServer with this property to connect through SOCKS or do tunneling. Use ProxyServer to connect to an HTTP proxy.
Note that the Sync App uses the system proxy by default. To use a different proxy, set ProxyAutoDetect to false.
The TCP port for a proxy-based firewall.
This specifies the TCP port for a proxy allowing traversal of a firewall. Use FirewallServer to specify the name or IP address. Specify the protocol with FirewallType.
The user name to use to authenticate with a proxy-based firewall.
The FirewallUser and FirewallPassword properties are used to authenticate against the proxy specified in FirewallServer and FirewallPort, following the authentication method specified in FirewallType.
A password used to authenticate to a proxy-based firewall.
This property is passed to the proxy specified by FirewallServer and FirewallPort, following the authentication method specified by FirewallType.
This section provides a complete list of the Proxy properties you can configure in the connection string for this provider.
Property | Description |
ProxyAutoDetect | This indicates whether to use the system proxy settings or not. This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings. |
ProxyServer | The hostname or IP address of a proxy to route HTTP traffic through. |
ProxyPort | The TCP port the ProxyServer proxy is running on. |
ProxyAuthScheme | The authentication type to use to authenticate to the ProxyServer proxy. |
ProxyUser | A user name to be used to authenticate to the ProxyServer proxy. |
ProxyPassword | A password to be used to authenticate to the ProxyServer proxy. |
ProxySSLType | The SSL type to use when connecting to the ProxyServer proxy. |
ProxyExceptions | A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer . |
This indicates whether to use the system proxy settings or not. This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings.
This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings.
To connect to an HTTP proxy, see ProxyServer. For other proxies, such as SOCKS or tunneling, see FirewallType.
The hostname or IP address of a proxy to route HTTP traffic through.
The hostname or IP address of a proxy to route HTTP traffic through. The Sync App can use the HTTP, Windows (NTLM), or Kerberos authentication types to authenticate to an HTTP proxy.
If you need to connect through a SOCKS proxy or tunnel the connection, see FirewallType.
By default, the Sync App uses the system proxy. If you need to use another proxy, set ProxyAutoDetect to false.
The TCP port the ProxyServer proxy is running on.
The port the HTTP proxy is running on that you want to redirect HTTP traffic through. Specify the HTTP proxy in ProxyServer. For other proxy types, see FirewallType.
The authentication type to use to authenticate to the ProxyServer proxy.
This value specifies the authentication type to use to authenticate to the HTTP proxy specified by ProxyServer and ProxyPort.
Note that the Sync App will use the system proxy settings by default, without further configuration needed; if you want to connect to another proxy, you will need to set ProxyAutoDetect to false, in addition to ProxyServer and ProxyPort. To authenticate, set ProxyAuthScheme and set ProxyUser and ProxyPassword, if needed.
The authentication type can be one of the following:
If you need to use another authentication type, such as SOCKS 5 authentication, see FirewallType.
A user name to be used to authenticate to the ProxyServer proxy.
The ProxyUser and ProxyPassword options are used to connect and authenticate against the HTTP proxy specified in ProxyServer.
You can select one of the available authentication types in ProxyAuthScheme. If you are using HTTP authentication, set this to the user name of a user recognized by the HTTP proxy. If you are using Windows or Kerberos authentication, set this property to a user name in one of the following formats:
user@domain domain\user
A password to be used to authenticate to the ProxyServer proxy.
This property is used to authenticate to an HTTP proxy server that supports NTLM (Windows), Kerberos, or HTTP authentication. To specify the HTTP proxy, you can set ProxyServer and ProxyPort. To specify the authentication type, set ProxyAuthScheme.
If you are using HTTP authentication, additionally set ProxyUser and ProxyPassword to HTTP proxy.
If you are using NTLM authentication, set ProxyUser and ProxyPassword to your Windows password. You may also need these to complete Kerberos authentication.
For SOCKS 5 authentication or tunneling, see FirewallType.
By default, the Sync App uses the system proxy. If you want to connect to another proxy, set ProxyAutoDetect to false.
The SSL type to use when connecting to the ProxyServer proxy.
This property determines when to use SSL for the connection to an HTTP proxy specified by ProxyServer. This value can be AUTO, ALWAYS, NEVER, or TUNNEL. The applicable values are the following:
AUTO | Default setting. If the URL is an HTTPS URL, the Sync App will use the TUNNEL option. If the URL is an HTTP URL, the component will use the NEVER option. |
ALWAYS | The connection is always SSL enabled. |
NEVER | The connection is not SSL enabled. |
TUNNEL | The connection is through a tunneling proxy. The proxy server opens a connection to the remote host and traffic flows back and forth through the proxy. |
A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer .
The ProxyServer is used for all addresses, except for addresses defined in this property. Use semicolons to separate entries.
Note that the Sync App uses the system proxy settings by default, without further configuration needed; if you want to explicitly configure proxy exceptions for this connection, you need to set ProxyAutoDetect = false, and configure ProxyServer and ProxyPort. To authenticate, set ProxyAuthScheme and set ProxyUser and ProxyPassword, if needed.
This section provides a complete list of the Logging properties you can configure in the connection string for this provider.
Property | Description |
LogModules | Core modules to be included in the log file. |
Core modules to be included in the log file.
Only the modules specified (separated by ';') will be included in the log file. By default all modules are included.
See the Logging page for an overview.
This section provides a complete list of the Schema properties you can configure in the connection string for this provider.
Property | Description |
Location | A path to the directory that contains the schema files defining tables, views, and stored procedures. |
BrowsableSchemas | This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC. |
Tables | This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC. |
Views | Restricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC. |
FlattenObjects | Set FlattenObjects to true to flatten object properties into columns of their own. Otherwise, objects nested in arrays are returned as strings of JSON. |
FlattenArrays | By default, nested arrays are returned as strings. The FlattenArrays property can be used to flatten the elements of nested arrays into columns of their own. Set FlattenArrays to the number of elements you want to return from nested arrays. |
A path to the directory that contains the schema files defining tables, views, and stored procedures.
The path to a directory which contains the schema files for the Sync App (.rsd files for tables and views, .rsb files for stored procedures). The folder location can be a relative path from the location of the executable. The Location property is only needed if you want to customize definitions (for example, change a column name, ignore a column, and so on) or extend the data model with new tables, views, or stored procedures.
If left unspecified, the default location is "%APPDATA%\\CData\\Avro Data Provider\\Schema" with %APPDATA% being set to the user's configuration directory:
This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC.
Listing the schemas from databases can be expensive. Providing a list of schemas in the connection string improves the performance.
This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC.
Listing the tables from some databases can be expensive. Providing a list of tables in the connection string improves the performance of the Sync App.
This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.
Specify the tables you want in a comma-separated list. Each table should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Tables=TableA,[TableB/WithSlash],WithCatalog.WithSchema.`TableC With Space`.
Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.
Restricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC.
Listing the views from some databases can be expensive. Providing a list of views in the connection string improves the performance of the Sync App.
This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.
Specify the views you want in a comma-separated list. Each view should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Views=ViewA,[ViewB/WithSlash],WithCatalog.WithSchema.`ViewC With Space`.
Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.
Set FlattenObjects to true to flatten object properties into columns of their own. Otherwise, objects nested in arrays are returned as strings of JSON.
To generate the column name, the Sync App concatenates the property name onto the object name with a dot.
For example, you can flatten the nested objects below at connection time:
[ { "grade": "A", "score": 2 }, { "grade": "A", "score": 6 }, { "grade": "A", "score": 10 }, { "grade": "A", "score": 9 }, { "grade": "B", "score": 14 } ]When FlattenObjects is set to true and FlattenArrays is set to 1, the preceding array is flattened into the following table:
Column Name | Column Value |
grades.0.grade | A |
grades.0.score | 2 |
By default, nested arrays are returned as strings. The FlattenArrays property can be used to flatten the elements of nested arrays into columns of their own. Set FlattenArrays to the number of elements you want to return from nested arrays.
This is only recommended for arrays that are expected to be short.
Set FlattenArrays to the number of elements you want to return from nested arrays. The specified elements are returned as columns. The zero-based index is concatenated to the column name. Other elements are ignored.
For example, you can return an arbitrary number of elements from an array of strings:
["FLOW-MATIC","LISP","COBOL"]When FlattenArrays is set to 1, the preceding array is flattened into the following table:
Column Name | Column Value |
languages.0 | FLOW-MATIC |
This section provides a complete list of the Miscellaneous properties you can configure in the connection string for this provider.
Property | Description |
AggregateFiles | When set to true, the provider will aggregate all the files in URI directory into a single result. |
Charset | Specifies the session character set for encoding and decoding character data transferred to and from the Avro file. The default value is UTF-8. |
DeleteDownloadedFiles | When set to true, the provider will delete parsed Avro files downloaded from cloud sources. |
DirectoryRetrievalDepth | Limit the subfolders recursively scanned when IncludeSubdirectories is enabled. |
ExcludeFiles | Comma-separated list of file extensions to exclude from the set of the files modeled as tables. |
IncludeDropboxTeamResources | Indicates if you want to include Dropbox team files and folders. |
IncludeFiles | Comma-separated list of file extensions to include into the set of the files modeled as tables. |
IncludeSubdirectories | Whether to read files from nested folders. In the case of a name collision, table names are prefixed by the underscore-separated folder names. |
InsertMode | The behavior when using bulk inserts to create Avro files. |
MaxRows | Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time. |
MetadataDiscoveryURI | Used when aggregating multiple files into one table, this property specifies a specific file to read to determined the aggregated table schema. |
Other | These hidden properties are used only in specific use cases. |
PageSize | (Optional) PageSize value. |
PathSeparator | Determines the character which will be used to replace the file separator. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
TemporaryLocalFolder | The path, or URI, to the folder that is used to temporarily download avro file(s). |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
UserDefinedViews | A filepath pointing to the JSON configuration file containing your custom views. |
When set to true, the provider will aggregate all the files in URI directory into a single result.
With this option enabled, the AggregatedFiles will be exposed which can be used to query the dataset. By default the first file in the folder is used to define the schema.
Specifies the session character set for encoding and decoding character data transferred to and from the Avro file. The default value is UTF-8.
Specifies the session character set for encoding and decoding character data transferred to and from the Avro file. The default value is UTF-8.
When set to true, the provider will delete parsed Avro files downloaded from cloud sources.
When set to false, downloaded files will be stored in directory specified through connection property 'TemporaryLocalFolder'.
Limit the subfolders recursively scanned when IncludeSubdirectories is enabled.
When IncludeSubdirectories is enabled, DirectoryRetrievalDepth specifies how many subfolders will be recursively scanned before stopping. -1 specifies that all subfolders are scanned.
Comma-separated list of file extensions to exclude from the set of the files modeled as tables.
It is also possible to specify datetime filters. We currently support CreatedDate and ModifiedDate. All extension filters are evaluated in disjunction (using OR operator), and then the resulting filter is evaluated in conjunction (using AND operator) with the datetime filters.
Examples:
ExcludeFiles="TXT,CreatedDate<='2020-11-26T07:39:34-05:00'"
ExcludeFiles="TXT,ModifiedDate<=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 50, 000)"
ExcludeFiles="ModifiedDate>=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 49, 000),ModifiedDate<=CURRENT_TIMESTAMP()"
Indicates if you want to include Dropbox team files and folders.
In order to access Dropbox team folders and files, please set this connection property to True.
Comma-separated list of file extensions to include into the set of the files modeled as tables.
Comma-separated list of file extensions to include into the set of the files modeled as tables. For example, IncludeFiles=avro,TXT. The default is avro.
A '*' value can be specified to include all files. A 'NOEXT' value can be specified to include files without an extension.
It is also possible to specify datetime filters. We currently support CreatedDate and ModifiedDate. All extension filters are evaluated in disjunction (using OR operator), and then the resulting filter is evaluated in conjunction (using AND operator) with the datetime filters.
Examples:
IncludeFiles="TXT,CreatedDate<='2020-11-26T07:39:34-05:00'"
IncludeFiles="TXT,ModifiedDate<=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 50, 000)"
IncludeFiles="ModifiedDate>=DATETIMEFROMPARTS(2020, 11, 26, 7, 40, 49, 000),ModifiedDate<=CURRENT_TIMESTAMP()"
Whether to read files from nested folders. In the case of a name collision, table names are prefixed by the underscore-separated folder names.
Whether to read files from nested folders. Table names are prefixed by each nested folder name separated by underscores. For example,
Root\subfolder1\tableA | Root\subfolder1\subfolder2\tableA |
subfolder1_tableA | subfolder1_subfolder2_tableA |
The behavior when using bulk inserts to create Avro files.
Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time.
Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time.
Used when aggregating multiple files into one table, this property specifies a specific file to read to determined the aggregated table schema.
Used when aggregating multiple files into one table, this property specifies a specific file to read to determined the aggregated table schema.
These hidden properties are used only in specific use cases.
The properties listed below are available for specific use cases. Normal driver use cases and functionality should not require these properties.
Specify multiple properties in a semicolon-separated list.
DefaultColumnSize | Sets the default length of string fields when the data source does not provide column length in the metadata. The default value is 2000. |
ConvertDateTimeToGMT | Determines whether to convert date-time values to GMT, instead of the local time of the machine. |
RecordToFile=filename | Records the underlying socket data transfer to the specified file. |
(Optional) PageSize value.
The PageSize value is used to specify number of rows to fetch at a time.
Determines the character which will be used to replace the file separator.
Determines the character which will be used to replace the file separator. If there is a avro file located in "Test/Files/Test.avro" and if this property is set to "_", then the table name for this file would be "Test_Files_Test.avro".
This property indicates whether or not to include pseudo columns as columns to the table.
This setting is particularly helpful in Entity Framework, which does not allow you to set a value for a pseudo column unless it is a table column. The value of this connection setting is of the format "Table1=Column1, Table1=Column2, Table2=Column3". You can use the "*" character to include all tables and all columns; for example, "*=*".
The path, or URI, to the folder that is used to temporarily download avro file(s).
For instance: TemporaryLocalFolder='C:/User/Download'
The value in seconds until the timeout error is thrown, canceling the operation.
If Timeout = 0, operations do not time out. The operations run until they complete successfully or until they encounter an error condition.
If Timeout expires and the operation is not yet complete, the Sync App throws an exception.
A filepath pointing to the JSON configuration file containing your custom views.
User Defined Views are defined in a JSON-formatted configuration file called UserDefinedViews.json. The Sync App automatically detects the views specified in this file.
You can also have multiple view definitions and control them using the UserDefinedViews connection property. When you use this property, only the specified views are seen by the Sync App.
This User Defined View configuration file is formatted as follows:
For example:
{ "MyView": { "query": "SELECT * FROM SampleTable_1 WHERE MyColumn = 'value'" }, "MyView2": { "query": "SELECT * FROM MyTable WHERE Id IN (1,2,3)" } }Use the UserDefinedViews connection property to specify the location of your JSON configuration file. For example:
"UserDefinedViews", "C:\\Users\\yourusername\\Desktop\\tmp\\UserDefinedViews.json"