Connecting to CSV Data Sources
Connecting to CSV
Below are example connection strings to CSV files or streams, using the component's default data modeling configuration (see below)
Service provider | URI formats | Connection example |
Local | Single File Path (One table)
file://localPath Directory Path (one table per file) file://localPath | URI=C:/folder1; |
HTTP or HTTPS | http://remoteStream
https://remoteStream | URI=http://www.host1.com/streamname1; |
Amazon S3 | Single File Path (One table)
s3://remotePath Directory Path (one table per file) s3://remotePath | URI=s3://bucket1/folder1; AWSSecretKey=secret1; AWSRegion=OHIO; |
Azure Blob Storage | azureblob://mycontainer/myblob/ | URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount; AzureAccessKey=myKey;
URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount; InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth; |
Google Drive | Single File Path (One table)
gdrive://remotePath gdrive://SharedWithMe/remotePath Directory Path (one table per file) gdrive://remotePath gdrive://SharedWithMe/remotePath | URI=gdrive://folder1;InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth;
URI=gdrive://SharedWithMe/folder1;InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth; |
OneDrive | Single File Path (One table)
onedrive://remotePath onedrive://SharedWithMe/remotePath Directory Path (one table per file) onedrive://remotePath onedrive://SharedWithMe/remotePath | URI=onedrive://folder1;InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth;
URI=onedrive://SharedWithMe/folder1;InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth; |
Box | Single File Path (One table)
box://remotePath Directory Path (one table per file) box://remotePath | URI=box://folder1; InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth; |
Dropbox | Single File Path (One table)
dropbox://remotePath Directory Path (one table per file) dropbox://remotePath | URI=dropbox://folder1; InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth; OAuthClientId=oauthclientid1; OAuthClientSecret=oauthcliensecret1; CallbackUrl=http://localhost:12345; |
SharePoint SOAP | Single File Path (One table)
sp://remotePath Directory Path (one table per file) sp://remotePath | URI=sp://Documents/folder1; User=user1; Password=password1; StorageBaseURL=https://subdomain.sharepoint.com; |
SharePoint REST | Single File Path (One table)
sprest://remotePath Directory Path (one table per file) sprest://remotePath | URI=sprest://Documents/folder1; InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth; StorageBaseURL=https://subdomain.sharepoint.com; |
FTP or FTPS | Single File Path (One table)
ftp://server:port/remotePath ftps://server:port/remotepath Directory Path (one table per file) ftp://server:port/remotePath ftps://server:port/remotepath; | URI=ftps://localhost:990/folder1; User=user1; Password=password1; |
SFTP | Single File Path (One table)
sftp://server:port/remotePath Directory Path (one table per file) sftp://server:port/remotePath | URI=sftp://127.0.0.1:22/folder1 User=user1; Password=password1;
URI=sftp://127.0.0.1:22/folder1 SSHAuthmode=PublicKey; SSHClientCert=myPrivateKey |
Azure Data Lake Store Gen1 | adl://remotePath
adl://Account.azuredatalakestore.net@remotePath | URI=adl://folder1; InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth; AzureStorageAccount=myAccount; AzureTenant=tenant;
URI=adl://myAccount.azuredatalakestore.net@folder1; InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth; AzureTenant=tenant; |
AzureDataLakeStoreGen2 | abfs://myfilesystem/remotePath
abfs://[email protected]/remotepath
| URI=abfs://myfilesystem/folder1; AzureStorageAccount=myAccount; AzureAccessKey=myKey;
URI=abfs://[email protected]/folder1; AzureAccessKey=myKey; |
AzureDataLakeStoreGen2 with SSL | abfss://myfilesystem/remotePath
abfss://[email protected]/remotepath
| URI=abfss://myfilesystem/folder1; AzureStorageAccount=myAccount; AzureAccessKey=myKey;
URI=abfss://[email protected]/folder1; AzureAccessKey=myKey; |
Wasabi | Single File Path (One table)
wasabi://bucket1/remotePath Directory Path (one table per file) wasabi://bucket1/remotePath | URI=wasabi://bucket/folder1; AccessKey=token1; SecretKey=secret1; Region='us-west-1'; |
Google Cloud Storage | Single File Path (One table)
gs://bucket/remotePath Directory Path (one table per file) gs://bucket/remotePath | URI=gs://bucket/folder1; InitiateOAuth=GETANDREFRESH; AuthScheme=OAuth; ProjectId=test; |
Oracle Cloud Storage | Single File Path (One table)
os://bucket/remotePath Directory Path (one table per file) os://bucket/remotePath | URI=os://bucket/folder1; AccessKey='myKey'; SecretKey='mySecretKey'; OracleNameSpace='myNameSpace' Region='us-west-1'; |
Azure File | Single File Path (One table)
azurefile://fileShare/remotePath Directory Path (one table per file) azurefile://fileShare/remotePath | URI=azurefile://bucket/folder1; AzureStorageAccount='myAccount'; AzureAccessKey='mySecretKey';
URI=azurefile://bucket/folder1; AzureStorageAccount='myAccount'; AzureSharedAccessSignature='mySharedAccessSignature'; |
IBM Object Storage Source | Single File Path (One table)
ibmobjectstorage://bucket1/remotePath Directory Path (one table per file) ibmobjectstorage://bucket1/remotePath | URI=ibmobjectstorage://bucket/folder1; AuthScheme='HMAC'; AccessKey=token1; SecretKey=secret1; Region='eu-gb';
URI=ibmobjectstorage://bucket/folder1; ApiKey=key1; Region='eu-gb'; AuthScheme=OAuth; InitiateOAuth=GETANDREFRESH; |
Hadoop Distributed File System | Single File Path (One table)
webhdfs://host:port/remotePath Directory Path (one table per file) webhdfs://host:port/remotePath | URI=webhdfs://host:port/folder1 |
Secure Hadoop Distributed File System | Single File Path (One table)
webhdfss://host:port/remotePath Directory Path (one table per file) webhdfss://host:port/remotePath | URI=webhdfss://host:port/folder1 |
Dynamically Detecting Schemas
The following properties control how the component automatically models CSV as tables when you connect:
- IncludeColumnHeaders: Set this to get column names from the first line of the specified files (the default). Otherwise, the column names are the column numbers.
- FMT: Set this to the format to be used to parse the text files: CsvDelimited (the default) or TabDelimited.
- IncludeFiles: Set this to a comma-separated list of file extensions to include into the set of files modelled as tables. (By default, .txt, .tab, and .csv files are modelled.)
- RowScanDepth: Set this to automatically determine data types by scanning rows up to the specified depth.
When working with local CSV, you can also use Schema.ini files, compatible with the Microsoft Jet driver, to define columns and data types. See Using Schema.ini for a guide.
Customizing Schemas
To customize column data types and other aspects of the schemas, you can save the schemas to static configuration files. The configuration files have a simple format that makes them easy to extend. For more information on extending the component schemas, see Generating Schema Files.
Accessing Sub-Folders
Set the following properties to model subfolders as views:
- IncludeSubdirectories: Set this to read files and Schema.ini from nested folders. In the case of a name collision, table names are prefixed by underscore-separated folder names. By default this is false.
- DirectoryRetrievalDepth: Set this to specify how many subfolders will be recursively scanned when IncludeSubdirectories is set. By default, the component scans all subfolders.
When IncludeSubdirectories is set, the automatically detected table names follow the convention below:
File Path | Root\subfolder1\tableA | Root\subfolder1\subfolder2\tableA |
Table Name | subfolder1_tableA | subfolder1_subfolder2_tableA |