Amazon S3

Version 25.3.9396


Amazon S3


You can use the Amazon S3 connector from the CData Sync application to capture data from Amazon S3 and move it to any supported destination. To do so, you need to add the connector, authenticate to the connector, and complete your connection.

Supported File Formats

When Sync writes data to Amazon S3, you can choose the file format for the exported data. The following file formats are supported for the Amazon S3 destination:

  • (Default) CSV—Plain text comma-separated values.

  • Avro—A row-based binary format that supports schema evolution.

  • Parquet—A columnar storage format that is optimized for analytics.

Add the Amazon S3 Connector

To enable Sync to use data from Amazon S3, you first must add the connector, as follows:

  1. Open the Connections page of the Sync dashboard.

  2. Click Add Connection to open the Select Connectors page.

  3. Click the Sources tab and locate the Amazon S3 row.

  4. Click the Configure Connection icon at the end of that row to open the New Connection page. If the Configure Connection icon is not available, click the Download Connector icon to install the Amazon S3 connector. For more information about installing new connectors, see Connections.

Authenticate to Amazon S3

After you add the connector, you need to set the required properties.

  • Connection Name - Enter a connection name of your choice.

  • File Format - Select the file format that you want to use: CSV (default), Avro, and Parquet.

    Note: Although the Delta Parquet file format is listed in the available file formats, Sync does not support delta file formats for sources. This option appears in the UI for file-based source connectors only because the delta file format cannot be restricted based on whether the connector is a source or a destination.

  • URI - Enter the path of your bucket and folder (for example, s3://BucketName/FolderName).

CData Sync supports authenticating to Amazon S3 in several ways. Select your authentication method below to proceed to the relevant section that contains the authentication details.

AWS Root Keys

To connect with your account root credentials, specify the following properties:

  • Auth Scheme: Select AwsRootKeys.

  • AWS Access Key: Enter your Amazon Web Services (AWS) account access key. You can locate this value on your AWS security credentials page.

  • AWS Secret Key: Enter your AWS account secret key. You can locate this value on your AWS security credentials page.

  • (Optional) MFA Serial Number: Enter the serial number for your multifactor authentication (MFA) device, if you are using such a device.

  • (Optional) MFA Token: Enter the temporary token that is available from your MFA device.

  • Temporary Token Duration: Enter the duration, in seconds, that you want for your temporary credentials. The default duration is 3600.

AWS EC2 Roles

When you run CData Sync on an EC2 instance, CData Sync can authenticate by using the IAM role that is assigned to the instance. Select AwsEC2Roles for Auth Scheme to use that role. No additional properties are required.

AWS IAM Roles

To connect with your IAM user credentials, specify the following properties:

  • Auth Scheme: Select AwsIAMRoles.

  • AWS Access Key: Enter your Amazon Web Services (AWS) account access key. You can locate this value on your AWS security credentials page.

  • AWS Secret Key: Enter your AWS account secret key. You can locate this value on your AWS security credentials page.

  • AWS Role ARN: Enter the Amazon Resource Name (ARN) for the role with which you want to authenticate.

  • (Optional) AWS External Id: Enter the unique identifier that is required when you assume a role in another account.

  • (Optional) MFA Serial Number: Enter the serial number for your multifactor authentication (MFA) device, if you are using such a device.

  • (Optional) MFA Token: Enter the temporary token that is available from your MFA device.

  • Temporary Token Duration: Enter the duration, in seconds, that you want for your temporary credentials. The default duration is 3600.

Active Directory Federation Services

To connect with single sign-on (SSO) via Active Directory Federation Services (ADFS), specify the following properties:

  • Auth Scheme: Select ADFS.

  • User: Enter the username that you use to authenticate to your ADFS account.

  • Password: Enter the password that you use to authenticate to your ADFS account.

  • SSO Login URL: Enter the login URL that is used by your SSO provider.

  • Use Lake Formation: Select True if you want the AWS Lake Formation service to retrieve temporary credentials. These temporary credentials enforce access policies against the user based on the configured IAM role. You can use this service when you authenticate through AzureAD, Okta, ADFS, and PingFederate, while providing a Security Assertion Markup Language (SAML) assertion. The default setting for Use Lake Formation is False.

  • (Optional) SSO Properties: Enter a semicolon-separated list of the single sign-on (SSO) properties that you want to use (for example, SSOProperty1=Value1;SSOProperty2=Value2;…).

Okta

To connect with single sign-on (SSO) via Okta, specify the following properties:

  • Auth Scheme: Select Okta.

  • User: Enter the username that you use to authenticate to your Okta account.

  • Password: Enter the password that you use to authenticate to your Okta account.

  • SSO Login URL: Enter the login URL that is used by your SSO provider.

  • Use Lake Formation: Select True if you want the AWS Lake Formation service to retrieve temporary credentials. These temporary credentials enforce access policies against the user based on the configured IAM role. You can use this service when you authenticate through AzureAD Okta, ADFS, and PingFederate, while providing a Security Assertion Markup Language (SAML) assertion. The default setting for Use Lake Formation is False.

  • (Optional) SSO Properties: Enter a semicolon-separated list of the single sign-on (SSO) properties that you want to use (for example, SSOProperty1=Value1;SSOProperty2=Value2;…).

PingFederate

  • Auth Scheme: Select PingFederate.

  • User: Enter the username that you use to authenticate to your PingFederate account.

  • Password: Enter the password that you use to authenticate to your PingFederate account.

  • SSO Login URL Enter the login URL that is used by your SSO provider.

  • SSO Exchange UrI: Enter the Partner Service Identifier URI that is configured in your PingFederate server instance. The URI is available under SP Connections > SP Connection > WS-Trust > Protocol Settings.

  • Use Lake Formation: Select True if you want the AWS Lake Formation service to retrieve temporary credentials. These temporary credentials enforce access policies against the user based on the configured IAM role. You can use this service when you authenticate through AzureAD, Okta, ADFS, and PingFederate, while providing a Security Assertion Markup Language (SAML) assertion. The default setting for Use Lake Formation is False.

  • (Optional) AWS Principal ARN: The Amazon Resource Name (ARN) of the Security Assertion Markup Language (SAML) identity provider in your AWS account.

  • (Optional) SSO Properties: Enter a comma-separated list of the single sign-on (SSO) properties that you want to use (for example, SSOProperty1=Value1;SSOProperty2=Value2;…).

AWS Temporary Credentials

To connect with AWS temporary credentials, specify the following properties:

  • Auth Scheme: Select AwsTempCredentials.

  • AWS Access Key: Enter the access key that is associated with your Amazon Web Services (AWS) account. This value is accessible from your AWS security credentials page.

  • AWS Secret Key: Enter the secret key that is associated with your AWS account. This value is accessible from your AWS security credentials page.

  • AWS Session Token: Enter your AWS session token. This token is provided with your temporary credentials. For more information, see AWS Identity and Access Management: User Guide.

AWS Credentials File

To connect with a credentials file, specify the following properties:

  • Auth Scheme - Select AwsCredentialsFile.

  • AWS Credentials File - Enter the location of your Amazon Web Services (AWS) credentials file.

  • AWS Credentials File Profile (optional) - Enter the name of the AWS profile that you want to use from the credentials file that you specify. If you do not enter a profile name, Sync uses the profile named default.

Azure Active Directory

To connect with an Azure Active Directory (AD) user account, specify the following properties:

  • Auth Scheme: Select AzureAD.

  • Use Lake Formation: Select True if you want the AWS Lake Formation service to retrieve temporary credentials. These temporary credentials enforce access policies against the user based on the configured IAM role. You can use this service when you authenticate through AzureAD, Okta, ADFS, and PingFederate, while providing a Security Assertion Markup Language (SAML) assertion. The default setting for Use Lake Formation is False.

  • OAuth Client Id: Enter the client Id that you were assigned when you registered your application with an OAuth authorization server.

  • OAuth Client Secret: Enter the client secret that you were assigned when you registered your application with an OAuth authorization server.

Complete Your Connection

To complete your connection:

  1. Specify the following properties:

    For all file formats:

    • Storage Base: Enter the URL of your cloud-storage service provider.

    For the CSV format:

    • FMT: Enter the format that you want to use to parse all text files. The default format is CsvDelimited

    • Aggregate Files: Specify whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. The default option is False.

    • Include Column Headers: Specify whether you want to obtain column headers from the first lines of the specified files. The default option is True.

    For the Avro and Parquet formats:

    • Data Model: Select the data model that you want to use to parse documents for your format and to generate the database metadata. The default data model is Document.

    • Aggregate Files: Specify whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. The default option is False.

  2. Define advanced connection settings on the Advanced tab. (In most cases, though, you should not need these settings.)

  3. Click Create & Test to create your connection.

More Information

For more information about interactions between CData Sync and Amazon S3, see Amazon S3 Connector for CData Sync.