The Parquet connector enables exporting data in Parquet format to the local filesystem.
Parquet Connector Data Source Creation
CALL SYSADMIN.createConnection(
name
=> <parquetalias>, jbossCLITemplateName =>
'ufile'
, connectionOrResourceAdapterProperties =>
'ParentDirectory="directory"'
) ;;
CALL SYSADMIN.createDataSource(
name
=> <parquetalias>, translator =>
'parquet'
, modelProperties =>
null
, translatorProperties =>
null
) ;;
The parquet translator is compatible with several connectors, i.e. different storage types can be used for .parquet files:
ufile
- local file storageftp
- FTP file storagesftp
- SFTP file storagescp
- SCP file storages3
- Amazon S3 file storageblob
- Azure Blob file storage
ufile Connector
CALL SYSADMIN.createConnection(
name
=>
'parquet_ufile'
, jbossCLITemplateName =>
'ufile'
, connectionOrResourceAdapterProperties =>
'ParentDirectory="D:/parquet"'
) ;;
CALL SYSADMIN.createDataSource(
name
=>
'parquet_ufile'
, translator =>
'parquet'
, modelProperties =>
'importer.loadMetadata=true'
, translatorProperties =>
null
) ;;
ftp Connector
CALL SYSADMIN.createConnection(
name
=>
'parquet_ftp'
, jbossCLITemplateName =>
'ftp'
, connectionOrResourceAdapterProperties =>
'host=localhost,port=21,secure=false,explicitTls=false,passive=true,user=<ftpUser>,password=<password>'
) ;;
CALL SYSADMIN.createDataSource(
name
=>
'parquet_ftp'
, translator =>
'parquet'
, modelProperties =>
'importer.loadMetadata=true'
, translatorProperties =>
null
) ;;
sftp Connector
CALL SYSADMIN.createConnection(
name
=>
'parquet_sftp'
,
"jbossCLITemplateName"
=>
'sftp'
,
"connectionOrResourceAdapterProperties"
=>
'host=localhost,port=2022,user=<ftpUser>,password=<password>'
,
"encryptedProperties"
=>
''
) ;;
CALL SYSADMIN.createDatasource(
name
=>
'parquet_sftp'
, translator =>
'parquet'
, modelProperties =>
'importer.loadMetadata=true'
, translatorProperties =>
''
, encryptedModelProperties =>
''
, encryptedTranslatorProperties =>
''
);;
scp Connector
CALL SYSADMIN.createConnection(
name
=>
'parquet_scp'
,
"jbossCLITemplateName"
=>
'scp'
,
"connectionOrResourceAdapterProperties"
=>
'port=2022,host=localhost,decompressCompressedFiles=false,user=<ftpUser>,password=<password>'
,
"encryptedProperties"
=>
''
) ;;
CALL SYSADMIN.createDatasource(
name
=>
'parquet_scp'
, translator =>
'parquet'
, modelProperties =>
'importer.loadMetadata=true'
, translatorProperties =>
''
, encryptedModelProperties =>
''
, encryptedTranslatorProperties =>
''
);;
s3 Connector
CALL SYSADMIN.createConnection(
name
=>
'parquet_s3'
, jbossCLITemplateName =>
's3'
, connectionOrResourceAdapterProperties =>
'region=<region>,keyId=<keyId>,secretKey=<secretKey>,bucketName=<bucketName>'
);;
CALL SYSADMIN.createDatasource(
name
=>
'parquet_s3'
, translator =>
'parquet'
, modelProperties =>
'importer.loadMetadata=true'
, translatorProperties =>
''
, encryptedModelProperties =>
''
, encryptedTranslatorProperties =>
''
);;
blob Connector
CALL SYSADMIN.createConnection(
name
=>
'parquet_blob'
, jbossCLITemplateName =>
'blob'
, connectionOrResourceAdapterProperties =>
'accountName=<accountName>,accountKey=<accountKey>,defaultEndpointsProtocol=https,containerName=<containerName>'
) ;;
CALL SYSADMIN.createDatasource(
name
=>
'parquet_blob'
, translator =>
'parquet'
, modelProperties =>
'importer.loadMetadata=true'
, translatorProperties =>
''
, encryptedModelProperties =>
''
, encryptedTranslatorProperties =>
''
);;
Model Properties
Name | Description | Default value |
---|---|---|
| When set to |
|
Translator Properties
Name | Description | Default value |
---|---|---|
| Compression method for parquet file format. Possible values: Only applies to writing to files. Files compressed differently from the one configured can still be read |
|
| When set to When set to
When new data is inserted into the table a new file is created. This setting only applies to creating new tables/files or inserting data |
|
Usage
The Parquet connector can manage data represented as single files or collections of files within a folder. Files created outside the CData Virtuality Server will still be handled as tables:
- single file with a .parquet extension will be represented as a table; when new data is inserted the file is overwritten;
- multiple files with a .parquet extension within a folder with a .parquet extention will be treated as a single table with the same name as the folder, file naming inside the folder does not matter; new files will be added upon insert.
New files will be created according to the writeSingleFile
translator property.
Data is exported using the SELECT INTO
command:
SELECT
*
INTO
<parquet data source
name
>.<
table
name
>
FROM
...
The data will be exported into the folder specified in the path connection property. The table is represented by a folder named according to the following pattern: <parquet data source name>_<table name>.parquet
. The folder contains files named like <table name>_<UID>.parquet
. When new data is inserted into a table, a new file is created in the respective table folder with new data appended to the old data.
You can also create a table using the CREATE TABLE
statement. However, the physical file will only be created when some data is inserted into this table using the INSERT VALUES
or INSERT SELECT
statement.
Example
CALL SYSADMIN.createConnection(
name
=>
'parquet_1'
, jbossCLITemplateName =>
'ufile'
, connectionOrResourceAdapterProperties =>
'ParentDirectory="/home/exportuser/examples"'
) ;;
CALL SYSADMIN.createDataSource(
name
=>
'parquet_1'
, translator =>
'parquet'
, modelProperties =>
'importer.loadMetadata=true'
, translatorProperties =>
null
) ;;
SELECT
*
INTO
parquet_1.example_salesorderdetail
FROM
adventurework.salesorderdetail ;;
As a result of this call, the content of the salesorderdetail table in the adventureworks schema will be exported into a file named something like example_salesorderdetail_1e04e8d5-f963-11ed-a1bc-0a0027000003.parquet in the /home/exportuser/examples/parquet_1.example_salesorderdetail.parquet folder.
The following changes were introduced in v.3.9:
- ufile
jbossCLITemplateName
is used for creating Parquet data sources; importer.loadMetadata
model property is available;- Tables are stored in dedicated folders;
- Files are not re-written when inserting data;
- Reading from Parquet tables is possible.
compression
translator property available since v4.5
See Also
Parquet File Creation and S3 Storage with Data Virtuality to learn how to take any data source table and create a local Parquet file.
Query Parquet Files in Data Virtuality Using Amazon Athena for information on how to read from Parquet.