AWS S3 Connector

The AWS S3 connector, known by the type name s3, exposes stored procedures to leverage resources stored in AWS S3.

Connector-specific Connection Properties

Name	Description
`keyId`	S3 key id
`secretKey`	S3 secret key
`bucketName`	S3 bucket name to work with
`region`	S3 region (optional)
`prefix`	`pathAndPattern` prefix to be used when handling files

Example

CALL SYSADMIN.createConnection('s3alias', 's3', 'region=eu-west-1, keyId=<id>, secretKey="<secret>", bucketName=dv-redshift-upload-test');;

CALL SYSADMIN.createDatasource('s3alias', 'ufile', 'importer.useFullSchemaName=false', null);;

IAM Role Authorization

When IAM Role authorization is configured, the keyId and secretKey connector parameters can be omitted:

CALL SYSADMIN.createConnection('s3alias', 's3', 'region=eu-west-1, bucketName=dv-redshift-upload-test');;

CALL SYSADMIN.createDatasource('s3alias', 'ufile', 'importer.useFullSchemaName=false', null);;

Example

This example shows using IAM policy on the AWS side:

{
 "Version": "2012-10-17",
 "Statement": [
        {
 "Sid": "AllowAccountLevelS3Actions",
 "Effect": "Allow",
 "Action": [
 "s3:ListAllMyBuckets",
 "s3:HeadBucket"
            ],
 "Resource": "*"
        },
        {
 "Sid": "AllowListAndReadS3ActionOnMyBucket",
 "Effect": "Allow",
 "Action": [
 "s3:Get*",
 "s3:List*"
            ],
 "Resource": [
 "arn:aws:s3:::mk-s3-test/*",
 "arn:aws:s3:::mk-s3-test"
            ]
        }
    ]
}

Multi-part Upload

The AWS S3 connector can be configured to perform the multipart upload using the following properties:

Name	Description	Default value
`multipartUpload`	`TRUE` for performing multi-part upload (optional)	`FALSE`
`numberOfThreads`	Number of threads for multi-part upload (optional)	5
`partSize`	Part size for multi-part upload in bytes (optional)	5MB

The partSize can be specified between 5 MB to 5 TB in size. If the specified value is out of this range, it will be automatically changed to either 5 MB or 5 TB, respectively.

Example

CALL SYSADMIN.createConnection('s3alias', 's3',

'region=eu-west-1,keyId=<id>,secretKey="<secret>",bucketName=dv-redshift-upload-test,multipartUpload=true,partSize=1024,numberOfThreads=5'

);;

CALL SYSADMIN.createDatasource('s3alias', 'ufile', 'importer.useFullSchemaName=false', null);;

Prefix

The Prefix enables limiting result set (see SDK documentation):

The Prefix property value gets passed in connectionOrResourceAdapterProperties;
All procedures of the connector automatically take the prefix into consideration (e.g. calling listFiles(pathAndPattern => NULL) still applies the prefix from the data source settings;
If the data source has a prefix configured, and a pathAndPattern gets passed, the values are concatenated. For example, if the data source is configured with prefix: a/b, and listFiles(pathAndPattern => 'c/d') gets called, this results in a/b/c/d.

Ceph Support

Ceph is an open-source distributed storage solution that can use S3 API. Please note that for the CData Virtuality S3 connector to work with Ceph, the RGW service must be configured.

A data source connected to Ceph via S3 API can be configured with the following properties:

Name	Description
`endPoint`	Mandatory in the case of Ceph; otherwise S3 API will use its Amazon endpoints by default
`passStyleAccess`	Mandatory in the case of Ceph if the DNS is not configured on the server running it; otherwise, by default, the S3 library will add a bucket name to the initial endpoint

Example

CALL SYSADMIN.createConnection(name => 'test_ceph_rgw', jbossCLITemplateName => 's3', connectionOrResourceAdapterProperties => 'endPoint=<endPoint>,keyId=<keyID>,secretKey=<secretKey>,bucketName=<bucketName>,passStyleAccess=true');;
 
CALL SYSADMIN.createDataSource(name => 'test_ceph_rgw', translator => 'ufile', modelProperties => 'importer.useFullSchemaName=false', translatorProperties => '');;