CData Cloud offers access to Google BigQuery across several standard services and protocols, in a cloud-hosted solution. Any application that can connect to a MySQL or SQL Server database can connect to Google BigQuery through CData Cloud.
CData Cloud allows you to standardize and configure connections to Google BigQuery as though it were any other OData endpoint, or standard SQL Server/MySQL database.
This page provides a guide to Establishing a Connection to Google BigQuery in CData Cloud, as well as information on the available resources, and a reference to the available connection properties.
Establishing a Connection shows how to authenticate to Google BigQuery and configure any necessary connection properties to create a database in CData Cloud
Accessing data from Google BigQuery through the available standard services and CData Cloud administration is documented in further details in the CData Cloud Documentation.
Connect to Google BigQuery by selecting the corresponding icon in the Database tab. Required properties are listed under Settings. The Advanced tab lists connection properties that are not typically required.
The Cloud supports using user accounts and GCP instance accounts for authentication.
The following sections discuss the available authentication schemes for Google BigQuery:
AuthScheme must be set to OAuth in all user account flows.
Get an OAuth Access Token
Set the following connection properties to obtain the OAuthAccessToken:
Then call stored procedures to complete the OAuth exchange:
Once you have obtained the access and refresh tokens, you can connect to data and refresh the OAuth access token either automatically or manually.
Automatic Refresh of the OAuth Access Token
To have the driver automatically refresh the OAuth access token, set the following on the first data connection:
Manual Refresh of the OAuth Access Token
The only value needed to manually refresh the OAuth access token when connecting to data is the OAuth refresh token.
Use the RefreshOAuthAccessToken stored procedure to manually refresh the OAuthAccessToken after the ExpiresIn parameter value returned by GetOAuthAccessToken has elapsed, then set the following connection properties:
Then call RefreshOAuthAccessToken with OAuthRefreshToken set to the OAuth refresh token returned by GetOAuthAccessToken. After the new tokens have been retrieved, open a new connection by setting the OAuthAccessToken property to the value returned by RefreshOAuthAccessToken.
Finally, store the OAuth refresh token so that you can use it to manually refresh the OAuth access token after it has expired.
Option 1: Obtain and Exchange a Verifier Code
To obtain a verifier code, you must authenticate at the OAuth authorization URL.
Follow the steps below to authenticate from the machine with an Internet browser and obtain the OAuthVerifier connection property.
On the headless machine, set the following connection properties to obtain the OAuth authentication values:
After the OAuth settings file is generated, you need to re-set the following properties to connect:
Option 2: Transfer OAuth Settings
Prior to connecting on a headless machine, you need to create and install a connection with the driver on a device that supports an Internet browser. Set the connection properties as described in "Desktop Applications" above.
After completing the instructions in "Desktop Applications", the resulting authentication values are encrypted and written to the location specified by OAuthSettingsLocation. The default filename is OAuthSettings.txt.
Once you have successfully tested the connection, copy the OAuth settings file to your headless machine.
On the headless machine, set the following connection properties to connect to data:
When running on a GCP virtual machine, the Cloud can authenticate using a service account tied to the virtual machine. To use this mode, set AuthScheme to GCPInstanceAccount.
The following sections detail Cloud settings that may be needed in advanced integrations.
Large result sets must be saved in a temporary or permanent table. You can use the following properties to control table persistence:
Enable the AllowLargeResultSets property to make the Cloud automatically create destination tables when needed. If a query result is too large to fit the BigQuery query cache, the Cloud creates a hidden dataset within the data project and re-executes the query with a destination table in that dataset. The dataset is configured so that all tables created within it expire in 24 hours.
In some situations you may want to change the name of the dataset created by the Cloud. For example, if multiple users are using the Cloud and do not have permissions to write to datasets created by the other users. See TempTableDataset for details on how to do this.
Enable the DestinationTable property to make the Cloud write query results to the given table. Writing query results to a single table imposes several limitations that you should keep in mind when using this option:
Set MaximumBillingTier to override your project limits on the maximum cost for any given query in a connection.
Google BigQuery provides several interfaces for operating on batches of rows. The Cloud supports these methods through the InsertMode option, each of which are specialized to different use cases:
In addition to bulk INSERTs, the Cloud also supports performing bulk UPDATE and DELETE operations. This requires the Cloud to upload the data containing the filters and rows to set into a new table in BigQuery, then perform a MERGE between the two tables and drop the temporary table. InsertMode determines how the rows are inserted into the temporary table but the Streaming and DML modes are not supported.
In most cases the Cloud can determine what columns need to be part of the SET vs. WHERE clauses of a bulk update. If you receive an error like "Primary keys must be defined for bulk UPDATE support," you can use PrimaryKeyIdentifiers to tell the Cloud what columns to treat as keys. In an update the values of key columns are used only to find matching rows and cannot be updated.
By default, the Cloud attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store.
To specify another certificate, see the SSLServerCert property for the available formats to do so.
To connect through the Windows system proxy, you do not need to set any additional connection properties. To connect to other proxies, set ProxyAutoDetect to false.
In addition, to authenticate to an HTTP proxy, set ProxyAuthScheme, ProxyUser, and ProxyPassword, in addition to ProxyServer and ProxyPort.
Set the following properties:
The CData Cloud models the data as defined within Google BigQuery for the ProjectId and DatasetId configured.
Views are client-side tables that cannot be modified. The Cloud uses these to report metadata about the Google BigQuery projects and datsets it is connected to.
In addition, the Cloud supports server-side views defined within Google BigQuery. These views may be used in SELECT statements the same way as tables. However, view schemas can easily become out of date and require the Cloud to refresh them. Please see RefreshViewSchemas for more details.
Google BigQuery allows creating external datasets that store data in Amazon S3 regions (like aws-us-east-1) or Azure Storage regions (like azure-useast2). The Cloud supports these datasets with two major limitations:
Stored Procedures are function-like interfaces to the data source. The Cloud uses these to manage Google BigQuery tables and jobs and to perform OAuth operations.
In addition to the client-side stored procedures offered by the Cloud, there is also support for server-side stored procedures defined in Google BigQuery.
The Cloud supports both CALL and EXEC using the procedure's parameter names.
Note that Cloud only supports IN parameters and resultset return values.
CALL `psychic-valve-137816`.Northwind.MostPopularProduct() CALL `psychic-valve-137816`.Northwind.GetStockedValue(24, 0.75) EXEC `psychic-valve-137816`.Northwind.MostPopularProduct EXEC `psychic-valve-137816`.Northwind.GetSockedValue productId = 24, discountRate = 0.75
Google BigQuery supports setting descriptions on tables but the Cloud does not report these by default. ShowTableDescriptions can be used to report table descriptions.
Google BigQuery does not support primary keys natively, but the Cloud allows you to define them so they can be used in environments that require primary keys to modify data. Primary keys can be defined using the PrimaryKeyIdentifiers option.
If policy tags from the Data Catalog service are defined on a table, they can be retrieved from the system tables using the PolicyTags column:
SELECT ColumnName, PolicyTags FROM sys_tablecolumns WHERE CatalogName = 'psychic-valve-137816' AND SchemaName = 'Northwind' AND TableName = 'Customers
Table definitions are dynamically generated based on the table definitions within Google BigQuery for the Project and Dataset specified in the connection string options.
Views are similar to tables in the way that data is represented; however, views are read-only.
Queries can be executed against a view as if it were a normal table.
Name | Description |
Datasets | Lists all the accessible datasets for a given project. |
PartitionsList | Lists the partitioning definitions for tables |
PartitionsValues | Lists the partitioning ranges for tables |
Projects | Lists all the projects for the authorized user. |
Lists all the accessible datasets for a given project.
Name | Type | Description |
Id [KEY] | String | The fully qualified, unique, opaque Id of the dataset. |
Kind | String | The resource type. |
FriendlyName | String | A descriptive name for the dataset |
DatasetReference_ProjectId | String | A unique reference to the container project. |
DatasetReference_DatasetId | String | A unique reference to the dataset, without the project name. |
Lists the partitioning definitions for tables
Name | Type | Description |
Id [KEY] | String | A unique identifier for the partition. |
ProjectId | String | The project that the table belongs to. |
DatasetId | String | The dataset that the table belongs to. |
TableName | String | The name of the table. |
ColumnName | String | The name of the column used for partitioning. |
ColumnType | String | The type of the partitioning column. |
Kind | String | The type of partitioning used by the table. One of DATE, RANGE or INGESTION. |
RequireFilter | Boolean | Whether a filter on the partition column is required to query the table. |
Lists the partitioning ranges for tables
Name | Type | Description |
Id | String | A unique identifier for the partition. |
RangeLow | String | The lowest value of the partition column. Either an integer when Kind is RANGE, or a date otherwise. |
RangeHigh | String | The highest value of the partition column. Either an integer when Kind is RANGE, or a date otherwise. |
RangeInterval | String | The range of values which are included in each partition. Only valid when Kind is RANGE |
DateResolution | String | How much of the date is significant to a TIME or INGESTION partition column. One of DAY, HOUR, MONTH or YEAR. |
Lists all the projects for the authorized user.
Name | Type | Description |
Id [KEY] | String | The unique identifier of the Project |
Kind | String | The resource type. |
FriendlyName | String | A descriptive name for the project. |
NumericId | String | The numeric Id of the project. |
ProjectReference_ProjectId | String | A unique reference to the project. |
Stored procedures are function-like interfaces that extend the functionality of the Cloud beyond simple SELECT/INSERT/UPDATE/DELETE operations with Google BigQuery.
Stored procedures accept a list of parameters, perform their intended function, and then return any relevant response data from Google BigQuery, along with an indication of whether the procedure succeeded or failed.
Name | Description |
CancelJob | Cancels a running BigQuery job. |
DeleteTable | Deletes the specified table from Google BigQuery. |
GetJob | Retrieves the configuration information and execution state for an existing job |
InsertJob | Inserts a Google BigQuery job, which can then be selected later to retrieve the query results. |
InsertLoadJob | Inserts a Google BigQuery load job, which adds data from Google Cloud Storage into an existing table. |
Cancels a running BigQuery job.
Name | Type | Description |
JobId | String | The Id of the job you wish to cancel. |
Region | String | The region where the job is executing. Not required if the job is a US or EU region. |
Name | Type | Description |
JobId | String | The JobId of the cancelled Job. |
Region | String | The region where the job was executing. |
Configuration_query_query | String | The query of the cancelled job. |
Configuration_query_destinationTable_tableId | String | The destination table tableId of the cancelled job. |
Configuration_query_destinationTable_projectId | String | The destination table projectId of the newly inserted job. |
Configuration_query_destinationTable_datasetId | String | The destination table datasetId of the newly inserted job. |
Status_State | String | Running state of the job. |
Status_errorResult_reason | String | A short error code that summarizes the error. |
Status_errorResult_message | String | A human-readable description of the error. |
Deletes the specified table from Google BigQuery.
Name | Type | Description |
TableId | String | TableId of the table you wish to delete. ProjectId and DatasetId can come from connection properties, or to override them, the format projectId:datasetId.TableId. |
Name | Type | Description |
Success | String | Returns true if operation is successful, else an exception is returned. |
Retrieves the configuration information and execution state for an existing job
Name | Type | Description |
JobId | String | The Id of the job you wish to return. |
Region | String | The region where the job is executing. Not required if the job is a US or EU region. |
Name | Type | Description |
JobId | String | The JobId of the newly insert Job. |
Region | String | The region where the job is executing. |
Configuration_query_query | String | The query of the newly inserted Job. |
Configuration_query_destinationTable_tableId | String | The destination table tableId of the newly inserted Job. |
Configuration_query_destinationTable_projectId | String | The destination table projectId of the newly inserted Job. |
Configuration_query_destinationTable_datasetId | String | The destination table datasetId of the newly inserted Job. |
Status_State | String | Running state of the job. |
Status_errorResult_reason | String | A short error code that summarizes the error. |
Status_errorResult_message | String | A human-readable description of the error. |
Inserts a Google BigQuery job, which can then be selected later to retrieve the query results.
Name | Type | Description |
Query | String | The query to submit to Google BigQuery. |
IsDML | String | Should be true if the query is a DML statement and false otherwise.
The default value is false. |
DestinationTable | String | The destination table for the query, in the format DestProjectId:DestDatasetId.DestTable |
WriteDisposition | String | How to write data to the destination table, such as truncate existing results, appending existing results, or writing only when the table is empty.
The allowed values are WRITE_TRUNCATE, WRITE_APPEND, WRITE_EMPTY. The default value is WRITE_TRUNCATE. |
DryRun | String | Whether or not this is a dry run of the job. |
MaximumBytesBilled | String | If provided, BigQuery will cancel the job if it attempts to process more than this many bytes. |
Region | String | The region to start executing the job in. |
Name | Type | Description |
JobId | String | The JobId of the newly insert Job. |
Region | String | The region where the job is executing. |
Configuration_query_query | String | The query of the newly inserted Job. |
Configuration_query_destinationTable_tableId | String | The destination table tableId of the newly inserted Job. |
Configuration_query_destinationTable_projectId | String | The destination table projectId of the newly inserted Job. |
Configuration_query_destinationTable_datasetId | String | The destination table datasetId of the newly inserted Job. |
Status_State | String | Running state of the job. |
Status_errorResult_reason | String | A short error code that summarizes the error. |
Status_errorResult_message | String | A human-readable description of the error. |
Inserts a Google BigQuery load job, which adds data from Google Cloud Storage into an existing table.
Name | Type | Description |
SourceURIs | String | A space-separated list of Google Cloud Storage URIs |
SourceFormat | String | The source format that the files are formatted in.
The allowed values are AVRO, NEWLINE_DELIMITED_JSON, DATASTORE_BACKUP, PARQUET, ORC, CSV. |
DestinationTable | String | The destination table for the query, in the format DestProjectId.DestDatasetId.DestTable |
DestinationTableProperties | String | A JSON object containing the table friendlyName, description and list of labels. |
DestinationTableSchema | String | A JSON list contianing the fields used to create the table. |
DestinationEncryptionConfiguration | String | A JSON object giving the KMS encryption settings for the table. |
SchemaUpdateOptions | String | A JSON list giving the options to apply when updating the destination table schema. |
TimePartitioning | String | A JSON object giving the time partitioning type and field. |
RangePartitioning | String | A JSON object giving the range partitioning field and buckets. |
Clustering | String | A JSON object giving the fields to be used for clustering. |
Autodetect | String | Whether options and schema should be automatically determined for JSON and CSV files. |
CreateDisposition | String | Whether to create the destination table if it does not exist.
The allowed values are CREATE_IF_NEEDED, CREATE_NEVER. The default value is CREATE_IF_NEEDED. |
WriteDisposition | String | How to write data to the destination table, such as truncate existing results, appending existing results, or writing only when the table is empty.
The allowed values are WRITE_TRUNCATE, WRITE_APPEND, WRITE_EMPTY. The default value is WRITE_APPEND. |
Region | String | The region to start executing the job in. Both the GCS resources and the BigQuery dataset must be in the same region. |
DryRun | String | Whether or not this is a dry run of the job.
The default value is false. |
MaximumBadRecords | String | If provided, the number of records that can be invalid before the entire job is canceled. By default all records must be valid.
The default value is 0. |
IgnoreUnknownValues | String | Whether to ignore unknown fields in the input file or treat them as errors. By default they are treated as errors.
The default value is false. |
AvroUseLogicalTypes | String | Whether to use Avro logical types when converting Avro data into BigQuery types.
The default value is true. |
CSVSkipLeadingRows | String | How many rows to skip at the start of CSV files. Usually used for skipping header rows. |
CSVEncoding | String | The name of the encoding used for CSV files.
The allowed values are ISO-8859-1, UTF-8. The default value is UTF-8. |
CSVNullMarker | String | If provided, this string is used for NULL values within CSV files. By default CSV files cannot use NULL. |
CSVFieldDelimiter | String | The character used to separate columns within CSV files.
The default value is ,. |
CSVQuote | String | The character used for quoted fields in CSV files. May be set to empty to disable quoting.
The default value is ". |
CSVAllowQuotedNewlines | String | Whether CSV files can contain newlines within quoted fields.
The default value is false. |
CSVAllowJaggedRows | String | Whether lines in CSV files may contain missing fields. False by default
The default value is false. |
DSBackupProjectionFields | String | A JSON list of fields to load from a Cloud datastore backup. |
ParquetOptions | String | A JSON object giving the Parquet-specific import options. |
DecimalTargetTypes | String | A JSON list giving the preference order applied to numeric types. |
HivePartitioningOptions | String | A JSON object giving the source-side partitioning options. |
Name | Type | Description |
JobId | String | The JobId of the newly insert Job. |
Region | String | The region where the job is executing. |
Configuration_load_destinationTable_tableId | String | The destination table tableId of the newly inserted Job. |
Configuration_load_destinationTable_projectId | String | The destination table projectId of the newly inserted Job. |
Configuration_load_destinationTable_datasetId | String | The destination table datasetId of the newly inserted Job. |
Status_State | String | Running state of the job. |
Status_errorResult_reason | String | A short error code that summarizes the error. |
Status_errorResult_message | String | A human-readable description of the error. |
You can query the system tables described in this section to access schema information, information on data source functionality, and batch operation statistics.
The following tables return database metadata for Google BigQuery:
The following tables return information about how to connect to and query the data source:
The following table returns query statistics for data modification queries, including batch operations::
Lists the available databases.
The following query retrieves all databases determined by the connection string:
SELECT * FROM sys_catalogs
Name | Type | Description |
CatalogName | String | The database name. |
Lists the available schemas.
The following query retrieves all available schemas:
SELECT * FROM sys_schemas
Name | Type | Description |
CatalogName | String | The database name. |
SchemaName | String | The schema name. |
Lists the available tables.
The following query retrieves the available tables and views:
SELECT * FROM sys_tables
Name | Type | Description |
CatalogName | String | The database containing the table or view. |
SchemaName | String | The schema containing the table or view. |
TableName | String | The name of the table or view. |
TableType | String | The table type (table or view). |
Description | String | A description of the table or view. |
IsUpdateable | Boolean | Whether the table can be updated. |
Describes the columns of the available tables and views.
The following query returns the columns and data types for the [publicdata].[samples].github_nested table:
SELECT ColumnName, DataTypeName FROM sys_tablecolumns WHERE TableName='github_nested' AND CatalogName='publicdata' AND SchemaName='samples'
Name | Type | Description |
CatalogName | String | The name of the database containing the table or view. |
SchemaName | String | The schema containing the table or view. |
TableName | String | The name of the table or view containing the column. |
ColumnName | String | The column name. |
DataTypeName | String | The data type name. |
DataType | Int32 | An integer indicating the data type. This value is determined at run time based on the environment. |
Length | Int32 | The storage size of the column. |
DisplaySize | Int32 | The designated column's normal maximum width in characters. |
NumericPrecision | Int32 | The maximum number of digits in numeric data. The column length in characters for character and date-time data. |
NumericScale | Int32 | The column scale or number of digits to the right of the decimal point. |
IsNullable | Boolean | Whether the column can contain null. |
Description | String | A brief description of the column. |
Ordinal | Int32 | The sequence number of the column. |
IsAutoIncrement | String | Whether the column value is assigned in fixed increments. |
IsGeneratedColumn | String | Whether the column is generated. |
IsHidden | Boolean | Whether the column is hidden. |
IsArray | Boolean | Whether the column is an array. |
IsReadOnly | Boolean | Whether the column is read-only. |
IsKey | Boolean | Indicates whether a field returned from sys_tablecolumns is the primary key of the table. |
Lists the available stored procedures.
The following query retrieves the available stored procedures:
SELECT * FROM sys_procedures
Name | Type | Description |
CatalogName | String | The database containing the stored procedure. |
SchemaName | String | The schema containing the stored procedure. |
ProcedureName | String | The name of the stored procedure. |
Description | String | A description of the stored procedure. |
ProcedureType | String | The type of the procedure, such as PROCEDURE or FUNCTION. |
Describes stored procedure parameters.
The following query returns information about all of the input parameters for the RefreshOAuthAccessToken stored procedure:
SELECT * FROM sys_procedureparameters WHERE ProcedureName='RefreshOAuthAccessToken' AND Direction=1 OR Direction=2
Name | Type | Description |
CatalogName | String | The name of the database containing the stored procedure. |
SchemaName | String | The name of the schema containing the stored procedure. |
ProcedureName | String | The name of the stored procedure containing the parameter. |
ColumnName | String | The name of the stored procedure parameter. |
Direction | Int32 | An integer corresponding to the type of the parameter: input (1), input/output (2), or output(4). input/output type parameters can be both input and output parameters. |
DataTypeName | String | The name of the data type. |
DataType | Int32 | An integer indicating the data type. This value is determined at run time based on the environment. |
Length | Int32 | The number of characters allowed for character data. The number of digits allowed for numeric data. |
NumericPrecision | Int32 | The maximum precision for numeric data. The column length in characters for character and date-time data. |
NumericScale | Int32 | The number of digits to the right of the decimal point in numeric data. |
IsNullable | Boolean | Whether the parameter can contain null. |
IsRequired | Boolean | Whether the parameter is required for execution of the procedure. |
IsArray | Boolean | Whether the parameter is an array. |
Description | String | The description of the parameter. |
Ordinal | Int32 | The index of the parameter. |
Describes the primary and foreign keys.
The following query retrieves the primary key for the [publicdata].[samples].github_nested table:
SELECT * FROM sys_keycolumns WHERE IsKey='True' AND TableName='github_nested' AND CatalogName='publicdata' AND SchemaName='samples'
Name | Type | Description |
CatalogName | String | The name of the database containing the key. |
SchemaName | String | The name of the schema containing the key. |
TableName | String | The name of the table containing the key. |
ColumnName | String | The name of the key column. |
IsKey | Boolean | Whether the column is a primary key in the table referenced in the TableName field. |
IsForeignKey | Boolean | Whether the column is a foreign key referenced in the TableName field. |
PrimaryKeyName | String | The name of the primary key. |
ForeignKeyName | String | The name of the foreign key. |
ReferencedCatalogName | String | The database containing the primary key. |
ReferencedSchemaName | String | The schema containing the primary key. |
ReferencedTableName | String | The table containing the primary key. |
ReferencedColumnName | String | The column name of the primary key. |
Describes the foreign keys.
The following query retrieves all foreign keys which refer to other tables:
SELECT * FROM sys_foreignkeys WHERE ForeignKeyType = 'FOREIGNKEY_TYPE_IMPORT'
Name | Type | Description |
CatalogName | String | The name of the database containing the key. |
SchemaName | String | The name of the schema containing the key. |
TableName | String | The name of the table containing the key. |
ColumnName | String | The name of the key column. |
PrimaryKeyName | String | The name of the primary key. |
ForeignKeyName | String | The name of the foreign key. |
ReferencedCatalogName | String | The database containing the primary key. |
ReferencedSchemaName | String | The schema containing the primary key. |
ReferencedTableName | String | The table containing the primary key. |
ReferencedColumnName | String | The column name of the primary key. |
ForeignKeyType | String | Designates whether the foreign key is an import (points to other tables) or export (referenced from other tables) key. |
Describes the primary keys.
The following query retrieves the primary keys from all tables and views:
SELECT * FROM sys_primarykeys
Name | Type | Description |
CatalogName | String | The name of the database containing the key. |
SchemaName | String | The name of the schema containing the key. |
TableName | String | The name of the table containing the key. |
ColumnName | String | The name of the key column. |
KeySeq | String | The sequence number of the primary key. |
KeyName | String | The name of the primary key. |
Describes the available indexes. By filtering on indexes, you can write more selective queries with faster query response times.
The following query retrieves all indexes that are not primary keys:
SELECT * FROM sys_indexes WHERE IsPrimary='false'
Name | Type | Description |
CatalogName | String | The name of the database containing the index. |
SchemaName | String | The name of the schema containing the index. |
TableName | String | The name of the table containing the index. |
IndexName | String | The index name. |
ColumnName | String | The name of the column associated with the index. |
IsUnique | Boolean | True if the index is unique. False otherwise. |
IsPrimary | Boolean | True if the index is a primary key. False otherwise. |
Type | Int16 | An integer value corresponding to the index type: statistic (0), clustered (1), hashed (2), or other (3). |
SortOrder | String | The sort order: A for ascending or D for descending. |
OrdinalPosition | Int16 | The sequence number of the column in the index. |
Returns information on the available connection properties and those set in the connection string.
When querying this table, the config connection string should be used:
jdbc:cdata:googlebigquery:config:
This connection string enables you to query this table without a valid connection.
The following query retrieves all connection properties that have been set in the connection string or set through a default value:
SELECT * FROM sys_connection_props WHERE Value <> ''
Name | Type | Description |
Name | String | The name of the connection property. |
ShortDescription | String | A brief description. |
Type | String | The data type of the connection property. |
Default | String | The default value if one is not explicitly set. |
Values | String | A comma-separated list of possible values. A validation error is thrown if another value is specified. |
Value | String | The value you set or a preconfigured default. |
Required | Boolean | Whether the property is required to connect. |
Category | String | The category of the connection property. |
IsSessionProperty | String | Whether the property is a session property, used to save information about the current connection. |
Sensitivity | String | The sensitivity level of the property. This informs whether the property is obfuscated in logging and authentication forms. |
PropertyName | String | A camel-cased truncated form of the connection property name. |
Ordinal | Int32 | The index of the parameter. |
CatOrdinal | Int32 | The index of the parameter category. |
Hierarchy | String | Shows dependent properties associated that need to be set alongside this one. |
Visible | Boolean | Informs whether the property is visible in the connection UI. |
ETC | String | Various miscellaneous information about the property. |
Describes the SELECT query processing that the Cloud can offload to the data source.
See SQL Compliance for SQL syntax details.
Below is an example data set of SQL capabilities. Some aspects of SELECT functionality are returned in a comma-separated list if supported; otherwise, the column contains NO.
Name | Description | Possible Values |
AGGREGATE_FUNCTIONS | Supported aggregation functions. | AVG, COUNT, MAX, MIN, SUM, DISTINCT |
COUNT | Whether COUNT function is supported. | YES, NO |
IDENTIFIER_QUOTE_OPEN_CHAR | The opening character used to escape an identifier. | [ |
IDENTIFIER_QUOTE_CLOSE_CHAR | The closing character used to escape an identifier. | ] |
SUPPORTED_OPERATORS | A list of supported SQL operators. | =, >, <, >=, <=, <>, !=, LIKE, NOT LIKE, IN, NOT IN, IS NULL, IS NOT NULL, AND, OR |
GROUP_BY | Whether GROUP BY is supported, and, if so, the degree of support. | NO, NO_RELATION, EQUALS_SELECT, SQL_GB_COLLATE |
OJ_CAPABILITIES | The supported varieties of outer joins supported. | NO, LEFT, RIGHT, FULL, INNER, NOT_ORDERED, ALL_COMPARISON_OPS |
OUTER_JOINS | Whether outer joins are supported. | YES, NO |
SUBQUERIES | Whether subqueries are supported, and, if so, the degree of support. | NO, COMPARISON, EXISTS, IN, CORRELATED_SUBQUERIES, QUANTIFIED |
STRING_FUNCTIONS | Supported string functions. | LENGTH, CHAR, LOCATE, REPLACE, SUBSTRING, RTRIM, LTRIM, RIGHT, LEFT, UCASE, SPACE, SOUNDEX, LCASE, CONCAT, ASCII, REPEAT, OCTET, BIT, POSITION, INSERT, TRIM, UPPER, REGEXP, LOWER, DIFFERENCE, CHARACTER, SUBSTR, STR, REVERSE, PLAN, UUIDTOSTR, TRANSLATE, TRAILING, TO, STUFF, STRTOUUID, STRING, SPLIT, SORTKEY, SIMILAR, REPLICATE, PATINDEX, LPAD, LEN, LEADING, KEY, INSTR, INSERTSTR, HTML, GRAPHICAL, CONVERT, COLLATION, CHARINDEX, BYTE |
NUMERIC_FUNCTIONS | Supported numeric functions. | ABS, ACOS, ASIN, ATAN, ATAN2, CEILING, COS, COT, EXP, FLOOR, LOG, MOD, SIGN, SIN, SQRT, TAN, PI, RAND, DEGREES, LOG10, POWER, RADIANS, ROUND, TRUNCATE |
TIMEDATE_FUNCTIONS | Supported date/time functions. | NOW, CURDATE, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, MONTH, QUARTER, WEEK, YEAR, CURTIME, HOUR, MINUTE, SECOND, TIMESTAMPADD, TIMESTAMPDIFF, DAYNAME, MONTHNAME, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, EXTRACT |
REPLICATION_SKIP_TABLES | Indicates tables skipped during replication. | |
REPLICATION_TIMECHECK_COLUMNS | A string array containing a list of columns which will be used to check for (in the given order) to use as a modified column during replication. | |
IDENTIFIER_PATTERN | String value indicating what string is valid for an identifier. | |
SUPPORT_TRANSACTION | Indicates if the provider supports transactions such as commit and rollback. | YES, NO |
DIALECT | Indicates the SQL dialect to use. | |
KEY_PROPERTIES | Indicates the properties which identify the uniform database. | |
SUPPORTS_MULTIPLE_SCHEMAS | Indicates if multiple schemas may exist for the provider. | YES, NO |
SUPPORTS_MULTIPLE_CATALOGS | Indicates if multiple catalogs may exist for the provider. | YES, NO |
DATASYNCVERSION | The CData Data Sync version needed to access this driver. | Standard, Starter, Professional, Enterprise |
DATASYNCCATEGORY | The CData Data Sync category of this driver. | Source, Destination, Cloud Destination |
SUPPORTSENHANCEDSQL | Whether enhanced SQL functionality beyond what is offered by the API is supported. | TRUE, FALSE |
SUPPORTS_BATCH_OPERATIONS | Whether batch operations are supported. | YES, NO |
SQL_CAP | All supported SQL capabilities for this driver. | SELECT, INSERT, DELETE, UPDATE, TRANSACTIONS, ORDERBY, OAUTH, ASSIGNEDID, LIMIT, LIKE, BULKINSERT, COUNT, BULKDELETE, BULKUPDATE, GROUPBY, HAVING, AGGS, OFFSET, REPLICATE, COUNTDISTINCT, JOINS, DROP, CREATE, DISTINCT, INNERJOINS, SUBQUERIES, ALTER, MULTIPLESCHEMAS, GROUPBYNORELATION, OUTERJOINS, UNIONALL, UNION, UPSERT, GETDELETED, CROSSJOINS, GROUPBYCOLLATE, MULTIPLECATS, FULLOUTERJOIN, MERGE, JSONEXTRACT, BULKUPSERT, SUM, SUBQUERIESFULL, MIN, MAX, JOINSFULL, XMLEXTRACT, AVG, MULTISTATEMENTS, FOREIGNKEYS, CASE, LEFTJOINS, COMMAJOINS, WITH, LITERALS, RENAME, NESTEDTABLES, EXECUTE, BATCH, BASIC, INDEX |
PREFERRED_CACHE_OPTIONS | A string value specifies the preferred cacheOptions. | |
ENABLE_EF_ADVANCED_QUERY | Indicates if the driver directly supports advanced queries coming from Entity Framework. If not, queries will be handled client side. | YES, NO |
PSEUDO_COLUMNS | A string array indicating the available pseudo columns. | |
MERGE_ALWAYS | If the value is true, The Merge Mode is forcibly executed in Data Sync. | TRUE, FALSE |
REPLICATION_MIN_DATE_QUERY | A select query to return the replicate start datetime. | |
REPLICATION_MIN_FUNCTION | Allows a provider to specify the formula name to use for executing a server side min. | |
REPLICATION_START_DATE | Allows a provider to specify a replicate startdate. | |
REPLICATION_MAX_DATE_QUERY | A select query to return the replicate end datetime. | |
REPLICATION_MAX_FUNCTION | Allows a provider to specify the formula name to use for executing a server side max. | |
IGNORE_INTERVALS_ON_INITIAL_REPLICATE | A list of tables which will skip dividing the replicate into chunks on the initial replicate. | |
CHECKCACHE_USE_PARENTID | Indicates whether the CheckCache statement should be done against the parent key column. | TRUE, FALSE |
CREATE_SCHEMA_PROCEDURES | Indicates stored procedures that can be used for generating schema files. |
The following query retrieves the operators that can be used in the WHERE clause:
SELECT * FROM sys_sqlinfo WHERE Name = 'SUPPORTED_OPERATORS'
Note that individual tables may have different limitations or requirements on the WHERE clause; refer to the Data Model section for more information.
Name | Type | Description |
NAME | String | A component of SQL syntax, or a capability that can be processed on the server. |
VALUE | String | Detail on the supported SQL or SQL syntax. |
Returns information about attempted modifications.
The following query retrieves the Ids of the modified rows in a batch operation:
SELECT * FROM sys_identity
Name | Type | Description |
Id | String | The database-generated Id returned from a data modification operation. |
Batch | String | An identifier for the batch. 1 for a single operation. |
Operation | String | The result of the operation in the batch: INSERTED, UPDATED, or DELETED. |
Message | String | SUCCESS or an error message if the update in the batch failed. |
The Cloud maps types from the data source to the corresponding data type available in the schema. The table below documents these mappings.
Google BigQuery | CData Schema | |
STRING | string | |
BYTES | binary | |
INTEGER | long | |
FLOAT | double | |
NUMERIC | decimal | |
BIGNUMERIC | decimal | |
BOOLEAN | bool | |
DATE | date | |
TIME | time | |
DATETIME | datetime | |
TIMESTAMP | datetime | |
STRUCT | See below | |
ARRAY | See below | |
GEOGRAPHY | string | |
JSON | string | |
INTERVAL | string |
Note that the NUMERIC type supports 38 digits of precision and the BIGDECIMAL type supports 76 digits of precision. Most platforms do not have a decimal type that supports the full precision of these values (.NET decimal supports 28 digits, and Java BigDecimal supports 38 by default). If this is the case, then you can cast these columns to a string when queried, or the connection can be configured to ignore them by setting IgnoreTypes=decimal.
Google BigQuery supports two kinds of types for storing compound values in a single row, STRUCT and ARRAY. In some places within Google BigQuery these are also known as RECORD and REPEATED types.
A STRUCT is a fixed-size group of values that are accessed by name and can have different types.
The Cloud flattens structs so their individual fields can be accessed using dotted names.
Note that these dotted names must be quoted.
-- trade_value STRUCT<currency STRING, value FLOAT> SELECT CONCAT([trade_value.value], ' ', NULLIF([trade_value.currency], 'USD')) FROM trades
An ARRAY is a group of values with the same type that can have any size. The Cloud treats the array as a single compound value and reports it as a JSON aggregate.
These types may be combined such that a STRUCT type contains an ARRAY field, or an ARRAY field is a list of STRUCT values.
The outer type takes precedence in how the field is processed:
/* Table contains fields: stocks STRUCT<symbol STRING, prices ARRAY<FLOAT>> offers: ARRAY<STRUCT<currency STRING, value FLOAT>> */ SELECT [stocks.symbol], /* ARRAY field can be read from STRUCT, but is converted to JSON */ [stocks.prices], [offers] /* STRUCT fields in an ARRAY cannot be accessed */ FROM market
The Cloud represents INTERVAL types as strings. Whenever a query requires an INTERVAL type, it must specify the INTERVAL using the BigQuery SQL INTERVAL format:
YEAR-MONTH DAY HOUR:MINUTE:SECOND.FRACTION. All queries that return INTERVAL values use this format unless they appear in an ARRAY aggregate, where the format depends upon how the Cloud reads the data.
For example, the value "5 years and 11 months, minus 10 days and 3 hours and 2.5 seconds" in the correct format is:
5-11 -10 -3:0:0.2.5
The Cloud exposes parameters on the following types. In each case the type parameters are optional, Google BigQuery has default values for types that are not parameterized.
These parameters are primarily for restricting the data written to the table. They are included in the table metadata as the column size for STRING and BYTES, and the numeric precision and scale for NUMERIC and BIGNUMERIC.
Type parameters have no effect on queries and are not reported within query metadata.
For example, in the example below the output of CONCAT is a plain STRING even though its inputs are a STRING(100) and b STRING(100).
SELECT CONCAT(a, b) FROM table_with_length_params
The connection string properties are the various options that can be used to establish a connection. This section provides a complete list of the options you can configure in the connection string for this provider. Click the links for further details.
For more information on establishing a connection, see Establishing a Connection.
Property | Description |
AuthScheme | The type of authentication to use when connecting to Google BigQuery. |
ProjectId | The ProjectId used to resolve unqualified tables and execute jobs. |
DatasetId | The DatasetId used to resolve unqualified tables. |
BillingProjectId | The ProjectId of the billing project for executing jobs. |
Property | Description |
AllowLargeResultSets | Whether or not to allow large datasets to be stored in temporary tables for large datasets. |
DestinationTable | This property determines where query results are stored in Google BigQuery. |
UseQueryCache | Specifies whether to use Google BigQuery's built-in query cache. |
PageSize | The number of results to return per page from Google BigQuery. |
PollingInterval | This determines how long to wait in seconds, between checks to see if a job has completed. |
AllowUpdatesWithoutKey | Whether or not to allow update without primary keys. |
FilterColumns | Please set `AllowUpdatesWithoutKey` to true before you could use this property. |
UseLegacySQL | Specifies whether to use BigQuery's legacy SQL dialect for this query. By default, Standard SQL will be used. |
Property | Description |
UseStorageAPI | Specifies whether to use BigQuery's Storage API for bulk data reads. |
UseArrowFormat | Specifies whether to use the Arrow format with BigQuery's Storage API. |
StorageThreshold | The minimum number of rows a query must return to invoke the Storage API. |
StoragePageSize | Specifies the page size to use for Storage API queries. |
Property | Description |
InsertMode | Specifies what kind of method to use when inserting data. By default streaming INSERTs are used. |
WaitForBatchResults | Whether to wait for the job to complete when using the bulk upload API. Only active when InsertMode is set to Upload. |
TempTableDataset | The prefix of the dataset that will contain temporary tables when performing bulk UPDATE or DELETE operations. |
Property | Description |
OAuthJWTCert | The JWT Certificate store. |
OAuthJWTCertType | The type of key store containing the JWT Certificate. |
OAuthJWTCertPassword | The password for the OAuth JWT certificate. |
OAuthJWTCertSubject | The subject of the OAuth JWT certificate. |
OAuthJWTIssuer | The issuer of the Java Web Token. |
OAuthJWTSubject | The user subject for which the application is requesting delegated access. |
Property | Description |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
Property | Description |
Verbosity | The verbosity level that determines the amount of detail included in the log file. |
Property | Description |
BrowsableSchemas | This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC. |
RefreshViewSchemas | Allows the provider to determine up-to-date view schemas automatically. |
ShowTableDescriptions | Controls whether table descriptions are returned via the platform metadata APIs and sys_tables / sys_views. |
PrimaryKeyIdentifiers | Set this property to define primary keys. |
AllowedTableTypes | Specifies what kinds of tables will be visible. |
FlattenObjects | Determines whether the provider flattens STRUCT fields into top-level columns. |
Property | Description |
StorageTimeout | How long a Storage API connection must remain idle before the provider reconnects. |
AllowAggregateParameters | Allows raw aggregates to be used in parameters when QueryPassthrough is enabled. |
ApplicationName | An application name in the form application/version. For example, AcmeReporting/1.0. |
AuditLimit | The maximum number of rows which will be stored within an audit table. |
AuditMode | What provider actions should be recorded to audit tables. |
BigQueryOptions | A comma separated list of Google BigQuery options. |
MaximumBillingTier | The MaximumBillingTier is a positive integer that serves as a multiplier of the basic price per TB. For example, if you set MaximumBillingTier to 2, the maximum cost for that query will be 2x basic price per TB. |
MaximumBytesBilled | Limits how many bytes BigQuery will allow a job to consume before it is cancelled. |
MaxRows | Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
TableSamplePercent | This determines what percent of a table is sampled with the TABLESAMPLE operator. |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
This section provides a complete list of the Authentication properties you can configure in the connection string for this provider.
Property | Description |
AuthScheme | The type of authentication to use when connecting to Google BigQuery. |
ProjectId | The ProjectId used to resolve unqualified tables and execute jobs. |
DatasetId | The DatasetId used to resolve unqualified tables. |
BillingProjectId | The ProjectId of the billing project for executing jobs. |
The type of authentication to use when connecting to Google BigQuery.
string
"Auto"
The ProjectId used to resolve unqualified tables and execute jobs.
string
""
This property and BillingProjectId are used to determine billing for jobs and resolve unqualified table names.
The Cloud must create a job within Google BigQuery to execute certain kinds of queries. For example, complex SELECT statements, UPDATE and DELETE statements, and INSERT statements (when InsertMode is DML) are all executed using jobs. The project where a job executes determines how the job is billed.
The Cloud determines the billing project using these rules. Note that only the first two rules apply when QueryPassthrough is enabled. Either this property or BillingProjectId must be set to execute passthrough queries.
SELECT FirstName, LastName FROM `psychic-valve-137816`.`Northwind`.`customers`
In addition to setting the billing project, the Cloud also uses this property to determine the default data project.
The data project is used to resolve tables included in queries when they are not fully qualified:
/* Unqualified, resolved against connection properties */ SELECT FirstName, LastName FROM `Northwind`.`customers` /* Qualified, project specified as catalog */ SELECT FirstName, LastName FROM `psychic-valve-137816`.`Northwind`.`customers`
Any unqualified table references in the query are resolved using the following rules. Note that only methods 1 and 2 are supported when QueryPassthrough is enabled. This means that any tables outside the default data project must be explicitly qualified.
SELECT ... FROM `psychic-valve-137816`.`Northwind`.`customers` INNER JOIN `Northwind`.`orders` ON ...
The DatasetId used to resolve unqualified tables.
string
""
When a query refers to a table it can leave the dataset implicit, or qualify
the dataset directly as the schema portion of the table:
/* Implicit, resolved against connection string */ SELECT FirstName, LastName FROM `customers` /* Explicit, dataset specified as schema */ SELECT FirstName, LastName FROM `psychic-valve-137816`.`Northwind`.`customers`
Any unqualified table references in the query are resolved using the following rules. Note that only method 1 is supported when QueryPassthrough is enabled. This means that passthrough queries must set this property or qualify all tables.
SELECT ... FROM `psychic-valve-137816`.`Northwind`.`customers` INNER JOIN `orders` ON ...
The ProjectId of the billing project for executing jobs.
string
""
This property is used with ProjectId to determine the project the Cloud executes jobs under. Please refer to that page for more information.
This section provides a complete list of the BigQuery properties you can configure in the connection string for this provider.
Property | Description |
AllowLargeResultSets | Whether or not to allow large datasets to be stored in temporary tables for large datasets. |
DestinationTable | This property determines where query results are stored in Google BigQuery. |
UseQueryCache | Specifies whether to use Google BigQuery's built-in query cache. |
PageSize | The number of results to return per page from Google BigQuery. |
PollingInterval | This determines how long to wait in seconds, between checks to see if a job has completed. |
AllowUpdatesWithoutKey | Whether or not to allow update without primary keys. |
FilterColumns | Please set `AllowUpdatesWithoutKey` to true before you could use this property. |
UseLegacySQL | Specifies whether to use BigQuery's legacy SQL dialect for this query. By default, Standard SQL will be used. |
Whether or not to allow large datasets to be stored in temporary tables for large datasets.
bool
false
Whether or not to allow large datasets to be stored in temporary tables for large datasets.
This property determines where query results are stored in Google BigQuery.
string
""
Google BigQuery queries have a maximum amount of data they are allowed to return directly. If this limit is exceeded, then queries will fail with an error message like Response too large to return. When this option is enabled the response limit does not apply, because all query responses are stored in a Google BigQuery table before being returned.
This option is set differently depending upon whether your connection is using UseLegacySQL or not. By default this option is set using the standard SQL syntax:
DestinationTable=project-name.dataset-name.table-name
When UseLegacySQL is enabled, this option is set using the legacy table syntax:
DestinationTable=project-name:dataset-name.table-name
When using this option with multiple connections, make sure that each connection has its own destination table. Sharing a table between connections can lead to results getting lost because parallel queries can overwrite each others results.
Specifies whether to use Google BigQuery's built-in query cache.
bool
true
Google BigQuery will cache the results of recent queries, and will use this cache for queries by default. Google BigQuery automatically updates the cache when a table is modified, so performance is generally better without any risk of queries returning stale data.
If this is set to false, the query is always run against the table directly.
The number of results to return per page from Google BigQuery.
string
"100000"
The pagesize can control the number of results returned per page from Google BigQuery. Setting a higher pagesize will cause more data to come back in a single HTTP request, but may take longer to execute. Setting a smaller pagesize will increase the number of HTTP requests to get all the data, but is generally recommended to ensure timeout exceptions do not occur.
Note that this option does not have an effect if UseStorageApi is enabled and the queries being executed can be executed on the Storage API. See StoragePageSize for more information.
This determines how long to wait in seconds, between checks to see if a job has completed.
string
"1"
This only applies to queries which are stored to a table instead of streamed directly to the Cloud. This applies in only three cases:
This property determines how long to wait between checking whether or not the query's results are ready. Very large resultsets or complex queries may take longer to process, and a low polling interval may result in may unnecessary requests being made to check the query status.
Whether or not to allow update without primary keys.
bool
false
Whether or not to allow update without primary keys.
Please set `AllowUpdatesWithoutKey` to true before you could use this property.
string
""
Remember setting `AllowUpdatesWithoutKey` to true before you could use this property:
Set the property like this:
`filterColumns=col1[,col2[,col3]];`
Specifies whether to use BigQuery's legacy SQL dialect for this query. By default, Standard SQL will be used.
bool
false
If set to true, the query will use BigQuery's Legacy SQL dialect to rebuild the query.
If set to false, the query will use BigQuery's standard SQL: https://cloud.google.com/bigquery/sql-reference/.
When UseLegacySQL is set to false, the values of AllowLargeResultSets is ignored. The query will be run as if AllowLargeResultSets is true.
This section provides a complete list of the Storage API properties you can configure in the connection string for this provider.
Property | Description |
UseStorageAPI | Specifies whether to use BigQuery's Storage API for bulk data reads. |
UseArrowFormat | Specifies whether to use the Arrow format with BigQuery's Storage API. |
StorageThreshold | The minimum number of rows a query must return to invoke the Storage API. |
StoragePageSize | Specifies the page size to use for Storage API queries. |
Specifies whether to use BigQuery's Storage API for bulk data reads.
bool
true
By default the Cloud will use the Storage API instead of the default REST API. Depending upon the complexity of the query, the Cloud may execute the query in one of two ways:
The BigQuery Storage API can read data faster and more efficiently than the REST API (accessible by setting this option to false), but is priced differently and requires extra OAuth permissions when using your own OAuth app. It also uses the separate StoragePageSize property instead of PageSize.
The BigQuery REST API requires no extra permissions and uses standard pricing, but is slower than the Storage API.
Specifies whether to use the Arrow format with BigQuery's Storage API.
bool
false
This property only has an effect when UseStorageApi is enabled. When performing reads against the Storage API, the Cloud can request data in different formats. By default it uses Avro but enabling this option makes it use Arrow.
This option should be enabled when working with time series data or other datasets that have many date, time, datetime or timestamp fields. For these datasets using Arrow can have noticable improvements over using Avro. Otherwise Avro and Arrow read times are very close and switching between them is unlikely to make a significant difference.
The minimum number of rows a query must return to invoke the Storage API.
string
"100000"
When the Cloud receives a query too complex to be run directly in the Storage API, it creates a query job and uses the Storage API to read from the query results table. If the query job returns fewer than the number of rows provided in this option, then the results are returned directly and the Storage API is not used.
This value should be set between 1 and 100000. Higher values will use the Storage API only for large resultsets, but will be delayed by reading more results from the query job. Lower values will result in smaller delays but will use the Storage API for more queries.
Note that this option only has an effect if UseStorageApi is enabled and the queries being executed cannot be executed directly on the Storage API. Queries which run directly on Storage never create query jobs.
Specifies the page size to use for Storage API queries.
string
"10000"
When UseStorageApi is enabled and the query being executed can be run on the Storage API, this option controls how many rows the Cloud is allowed to buffer on the client.
A higher value will generally make queries faster at the expense of consuming more memory, while lower values will conserve memory but make queries slower.
This section provides a complete list of the Uploading properties you can configure in the connection string for this provider.
Property | Description |
InsertMode | Specifies what kind of method to use when inserting data. By default streaming INSERTs are used. |
WaitForBatchResults | Whether to wait for the job to complete when using the bulk upload API. Only active when InsertMode is set to Upload. |
TempTableDataset | The prefix of the dataset that will contain temporary tables when performing bulk UPDATE or DELETE operations. |
Specifies what kind of method to use when inserting data. By default streaming INSERTs are used.
string
"Streaming"
This section provides only a summary of the mechanisms that each of these modes use. Please see Advanced Integrations for more details on how to use each of these modes.
When UseLegacySQL is true only Streaming and Upload modes are allowed. The Legacy SQL dialect does not support DML statements.
Whether to wait for the job to complete when using the bulk upload API. Only active when InsertMode is set to Upload.
bool
true
This property determines whether the Cloud will wait for batch jobs to report their status. By default property is true and INSERT queries will complete only once Google BigQuery has finished executed them. When this property is false the INSERT query will complete as soon as a job is submitted for it.
The default mode is recommended for reliability:
You can disable this option to achieve lower delays when inserting, but you must also make sure to obey the Google BigQuery rate limits and check the status of each job to track their status and determine whether they have succeeded or failed.
The prefix of the dataset that will contain temporary tables when performing bulk UPDATE or DELETE operations.
string
"_CDataTempTableDataset"
Internally bulk UPDATE and DELETE use Google BigQuery MERGE queries, which require creating a table to hold all the update operations. This option is used along with the target table's region to determine the name of the dataset where these temporary tables are created. Each region must have its own temporary dataset so that the temporary table and the MERGE table can be stored in the same project/dataset. This avoids unnecessary data transfer charges.
For example, the Cloud would create a dataset called "_CDataTempTableDataset_US" for tables in the US region and a dataset called "_CDataTempTableDataset_asia_southeast_1" for tables in the Singapore region.
This section provides a complete list of the JWT OAuth properties you can configure in the connection string for this provider.
Property | Description |
OAuthJWTCert | The JWT Certificate store. |
OAuthJWTCertType | The type of key store containing the JWT Certificate. |
OAuthJWTCertPassword | The password for the OAuth JWT certificate. |
OAuthJWTCertSubject | The subject of the OAuth JWT certificate. |
OAuthJWTIssuer | The issuer of the Java Web Token. |
OAuthJWTSubject | The user subject for which the application is requesting delegated access. |
The JWT Certificate store.
string
""
The name of the certificate store for the client certificate.
The OAuthJWTCertType field specifies the type of the certificate store specified by OAuthJWTCert. If the store is password protected, specify the password in OAuthJWTCertPassword.
OAuthJWTCert is used in conjunction with the OAuthJWTCertSubject field in order to specify client certificates. If OAuthJWTCert has a value, and OAuthJWTCertSubject is set, a search for a certificate is initiated. Please refer to the OAuthJWTCertSubject field for details.
Designations of certificate stores are platform-dependent.
The following are designations of the most common User and Machine certificate stores in Windows:
MY | A certificate store holding personal certificates with their associated private keys. |
CA | Certifying authority certificates. |
ROOT | Root certificates. |
SPC | Software publisher certificates. |
In Java, the certificate store normally is a file containing certificates and optional private keys.
When the certificate store type is PFXFile, this property must be set to the name of the file. When the type is PFXBlob, the property must be set to the binary contents of a PFX file (i.e. PKCS12 certificate store).
The type of key store containing the JWT Certificate.
string
"GOOGLEJSONBLOB"
This property can take one of the following values:
USER | For Windows, this specifies that the certificate store is a certificate store owned by the current user. Note: This store type is not available in Java. |
MACHINE | For Windows, this specifies that the certificate store is a machine store. Note: this store type is not available in Java. |
PFXFILE | The certificate store is the name of a PFX (PKCS12) file containing certificates. |
PFXBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in PFX (PKCS12) format. |
JKSFILE | The certificate store is the name of a Java key store (JKS) file containing certificates. Note: this store type is only available in Java. |
JKSBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in Java key store (JKS) format. Note: this store type is only available in Java. |
PEMKEY_FILE | The certificate store is the name of a PEM-encoded file that contains a private key and an optional certificate. |
PEMKEY_BLOB | The certificate store is a string (base64-encoded) that contains a private key and an optional certificate. |
PUBLIC_KEY_FILE | The certificate store is the name of a file that contains a PEM- or DER-encoded public key certificate. |
PUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains a PEM- or DER-encoded public key certificate. |
SSHPUBLIC_KEY_FILE | The certificate store is the name of a file that contains an SSH-style public key. |
SSHPUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains an SSH-style public key. |
P7BFILE | The certificate store is the name of a PKCS7 file containing certificates. |
PPKFILE | The certificate store is the name of a file that contains a PPK (PuTTY Private Key). |
XMLFILE | The certificate store is the name of a file that contains a certificate in XML format. |
XMLBLOB | The certificate store is a string that contains a certificate in XML format. |
GOOGLEJSON | The certificate store is the name of a JSON file containing the service account information. Only valid when connecting to a Google service. |
GOOGLEJSONBLOB | The certificate store is a string that contains the service account JSON. Only valid when connecting to a Google service. |
The password for the OAuth JWT certificate.
string
""
If the certificate store is of a type that requires a password, this property is used to specify that password in order to open the certificate store.
This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys are not encrypted.
The subject of the OAuth JWT certificate.
string
"*"
When loading a certificate the subject is used to locate the certificate in the store.
If an exact match is not found, the store is searched for subjects containing the value of the property.
If a match is still not found, the property is set to an empty string, and no certificate is selected.
The special value "*" picks the first certificate in the certificate store.
The certificate subject is a comma separated list of distinguished name fields and values. For instance "CN=www.server.com, OU=test, C=US, [email protected]". Common fields and their meanings are displayed below.
Field | Meaning |
CN | Common Name. This is commonly a host name like www.server.com. |
O | Organization |
OU | Organizational Unit |
L | Locality |
S | State |
C | Country |
E | Email Address |
If a field value contains a comma it must be quoted.
The issuer of the Java Web Token.
string
""
The issuer of the Java Web Token. Enter the value of a delegated user Email Address.
This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys contain a copy of the issuer account.
The issuer of the Java Web Token. Enter the value of a delegated user Email Address.
This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys contain a copy of the issuer account.
The user subject for which the application is requesting delegated access.
string
""
The user subject for which the application is requesting delegated access. Enter the value of the Service Account Email.
The user subject for which the application is requesting delegated access. Enter the value of the Service Account Email.
This section provides a complete list of the SSL properties you can configure in the connection string for this provider.
Property | Description |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
The certificate to be accepted from the server when connecting using TLS/SSL.
string
""
If using a TLS/SSL connection, this property can be used to specify the TLS/SSL certificate to be accepted from the server. Any other certificate that is not trusted by the machine is rejected.
This property can take the following forms:
Description | Example |
A full PEM Certificate (example shortened for brevity) | -----BEGIN CERTIFICATE----- MIIChTCCAe4CAQAwDQYJKoZIhv......Qw== -----END CERTIFICATE----- |
A path to a local file containing the certificate | C:\cert.cer |
The public key (example shortened for brevity) | -----BEGIN RSA PUBLIC KEY----- MIGfMA0GCSq......AQAB -----END RSA PUBLIC KEY----- |
The MD5 Thumbprint (hex values can also be either space or colon separated) | ecadbdda5a1529c58a1e9e09828d70e4 |
The SHA1 Thumbprint (hex values can also be either space or colon separated) | 34a929226ae0819f2ec14b4a3d904f801cbb150d |
If not specified, any certificate trusted by the machine is accepted.
Use '*' to signify to accept all certificates. Note that this is not recommended due to security concerns.
This section provides a complete list of the Logging properties you can configure in the connection string for this provider.
Property | Description |
Verbosity | The verbosity level that determines the amount of detail included in the log file. |
The verbosity level that determines the amount of detail included in the log file.
string
"1"
The verbosity level determines the amount of detail that the Cloud reports to the Logfile. Verbosity levels from 1 to 5 are supported. These are detailed in the Logging page.
This section provides a complete list of the Schema properties you can configure in the connection string for this provider.
Property | Description |
BrowsableSchemas | This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC. |
RefreshViewSchemas | Allows the provider to determine up-to-date view schemas automatically. |
ShowTableDescriptions | Controls whether table descriptions are returned via the platform metadata APIs and sys_tables / sys_views. |
PrimaryKeyIdentifiers | Set this property to define primary keys. |
AllowedTableTypes | Specifies what kinds of tables will be visible. |
FlattenObjects | Determines whether the provider flattens STRUCT fields into top-level columns. |
This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC.
string
""
Listing the schemas from databases can be expensive. Providing a list of schemas in the connection string improves the performance.
Allows the provider to determine up-to-date view schemas automatically.
bool
true
When using BigQuery views, BigQuery stores a copy of the view schema with the view itself. However, these stored view schemas are not updated when the tables used by the view change. This means that the stored view schema can easily become out of date and cause queries using the view to fail.
By default, the Cloud will not use the stored view schema and will instead query the view to determine the available columns. This guarantees that the schema will be up to date although it requires the Cloud to start a query job.
You can disable this option to force the Cloud to use the stored view schemas. This prevents the Cloud from running any queries when getting a view schema, but also means that queries using the view will fail if the schema is out of date.
Controls whether table descriptions are returned via the platform metadata APIs and sys_tables / sys_views.
bool
false
By default table descriptions are not shown, since the Google BigQuery API requires an extra request beyond what is usually required for reading tables.
Enabling this option will show table descriptions, but will cost an extra API request for every table when a table list is fetched. This can slow down metadata operations on large datasets.
Set this property to define primary keys.
string
""
Google BigQuery does not natively support primary keys, but for certain DML operations or database tools you may need to define them. By default this option is disabled and no tables will have primary keys except for the ones defined in schema files (if you set Location).
Primary keys are defined using a list of rules which match tables and provide a list of key columns. For example, PrimaryKeyIdentifiers="*=key;transactions=tx_date,tx_serial;user_comments=" has three rules separated by semicolons:
Note that the table names can include just the table, the table and dataset or the table, dataset and project.
Both column and table names may be quoted using SQL quotes:
/* Rules with just table names use the connection ProjectId (or DataProjectId) and DatasetId. All these rules refer to the same table with a connection where ProjectId=someProject;DatasetId=someDataset */ someTable=a,b,c someDataset.someTable=a,b,c someProject.someDataset.someTable=a,b,c /* Any table or column name may be quoted */ `someProject`."someDataset".[someTable]=`a`,[b],"c"
Specifies what kinds of tables will be visible.
string
"TABLE,EXTERNAL,VIEW,MATERIALIZED_VIEW"
This option is a comma-separated list of the table type values that the Cloud displays. Any table-like or view-like entity that doesn't have a matching type will not be reported when listing tables.
For example, to restrict the Cloud to listing only simple tables and views, this option would be set to TABLE,VIEW
Determines whether the provider flattens STRUCT fields into top-level columns.
bool
true
By default the Cloud reports each field in a STRUCT column as its own column while the STRUCT column itself is hidden.
This process is recursively applied to nested STRUCT values.
For example, if the following table is defined in Google BigQuery then the Cloud reports 3 columns: location.coords.lat, location.coords.lon and location.country:
CREATE TABLE t(location STRUCT<coords STRUCT<lat FLOAT64, lon FLOAT64>, country STRING>);
If this property is disabled, then the top-level STRUCT is not expanded and is left as its own column. The value of this column is reported as a JSON aggregate. In the above example, the Cloud reports only the location column when flattening is disabled.
This section provides a complete list of the Miscellaneous properties you can configure in the connection string for this provider.
Property | Description |
StorageTimeout | How long a Storage API connection must remain idle before the provider reconnects. |
AllowAggregateParameters | Allows raw aggregates to be used in parameters when QueryPassthrough is enabled. |
ApplicationName | An application name in the form application/version. For example, AcmeReporting/1.0. |
AuditLimit | The maximum number of rows which will be stored within an audit table. |
AuditMode | What provider actions should be recorded to audit tables. |
BigQueryOptions | A comma separated list of Google BigQuery options. |
MaximumBillingTier | The MaximumBillingTier is a positive integer that serves as a multiplier of the basic price per TB. For example, if you set MaximumBillingTier to 2, the maximum cost for that query will be 2x basic price per TB. |
MaximumBytesBilled | Limits how many bytes BigQuery will allow a job to consume before it is cancelled. |
MaxRows | Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
TableSamplePercent | This determines what percent of a table is sampled with the TABLESAMPLE operator. |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
How long a Storage API connection must remain idle before the provider reconnects.
string
"60"
Google BigQuery and many proxies/firewalls restrict the amount of time that idle connections stay alive before they are forcibly closed. This can be a problem when using the Storage API because the Cloud may stream data faster than it can be consumed. While the consumer is catching up, the Cloud does not use its connection and it may be closed by the next time the Cloud uses it.
To avoid this the Cloud will automatically close and reopen the connection if it has been idle for too long. This property controls how many seconds the connection has to be idle for the Cloud to reset it. To disable these resets this property can also set to 0 or a negative value.
Allows raw aggregates to be used in parameters when QueryPassthrough is enabled.
bool
false
This option affects how string parameters are handled when using direct queries through QueryPassthrough. For example, consider this query:
INSERT INTO proj.data.tbl(x) VALUES (@x)
By default, this option is disabled and string parameters are quoted and escaped into SQL strings. That means that any value can be safely used as a string parameter, but it also means that parameters cannot be used as raw aggregate values:
/* * If @x is set to: test value ' contains quote * * Result is a valid query */ INSERT INTO proj.data.tbl(x) VALUES ('test value \' contains quote') /* * If @x is set to: ['valid', ('aggregate', 'value')] * * Result contains string instead of aggregate: */ INSERT INTO proj.data.tbl(x) VALUES ('[\'valid\', (\'aggregate\', \'value\')]')
When this option is enabled, string parameters are inserted directly into the query. This means that raw aggregates can be used as parameters, but it also means that all simple strings must be escaped:
/* * If @x is set to: test value ' contains quote * * Result is an invalid query */ INSERT INTO proj.data.tbl(x) VALUES (test value ' contains quote) /* * If @x is set to: ['valid', ('aggregate', 'value')] * * Result is an aggregate */ INSERT INTO proj.data.tbl(x) VALUES (['valid', ('aggregate', 'value')])
An application name in the form application/version. For example, AcmeReporting/1.0.
string
""
The Cloud identifies itself to BigQuery using a Google partner User-Agent header. The first part of the User-Agent is fixed and identifies the client as a specific build of the CData Cloud. The last portion reports the specific application using the Cloud.
The maximum number of rows which will be stored within an audit table.
string
"1000"
When auditing is enabled with the AuditMode option, this property is used to determine how many rows will be allowed in the audit table at once.
By default this property is 1000, meaning that only the 1000 most recent audit events will be available within the audit table.
This property can also be set to -1, which places no limits on the size of the audit
table. In this mode, the audit table should be periodically cleared to prevent the
Cloud from using excessive memory.
DELETE FROM AuditJobs#TEMP
What provider actions should be recorded to audit tables.
string
""
The Cloud can record certain internal actions taken when it runs queries. For each of those actions listed in this option, the Cloud will create a temproary audit table which logs when the action took place, what query caused the action and any other relevant information.
By default this option is set to 'none' and the Cloud does not record any audit information. This option can also be set to a comma-separated list of the following actions:
Mode Name | Audit Table | Description | Columns |
start-jobs | AuditJobs#TEMP | Records all jobs started by the Cloud | Timestamp,Query,ProjectId,Location,JobId |
Refer to AuditLimit for more information on how to limit the size of these tables.
A comma separated list of Google BigQuery options.
string
""
A list of Google BigQuery options:
Option | Description |
gbqoImplicitJoinAsUnion | This option will prevent the driver from converting an IMPLICIT JOIN into a CROSS JOIN as expected by SQL92. Instead, it will leave it as an IMPLICIT JOIN, which Google BigQuery will execute as a UNION ALL. |
The MaximumBillingTier is a positive integer that serves as a multiplier of the basic price per TB. For example, if you set MaximumBillingTier to 2, the maximum cost for that query will be 2x basic price per TB.
string
""
Limits the billing tier for this job. Queries that have resource usage beyond this tier will fail (without incurring a charge). If unspecified, this will be set to your project default. If your query is too compute intensive for BigQuery to complete at the standard per TB pricing tier, BigQuery returns a billingTierLimitExceeded error and an estimate of how much the query would cost. To run the query at a higher pricing tier, pass a new value for maximumBillingTier as part of the query request. The maximumBillingTier is a positive integer that serves as a multiplier of the basic price per TB. For example, if you set maximumBillingTier to 2, the maximum cost for that query will be 2x basic price per TB.
Limits how many bytes BigQuery will allow a job to consume before it is cancelled.
string
""
When this value is provided, all jobs will use this value as their default billing cap. If a job uses more than this many bytes, BigQuery will cancel it and it will not be billed. By default there is no cap and all jobs will be billed for however many bytes they consume.
This only has an effect when using DestinationTable or when using the InsertJob stored procedure. BigQuery does not allow standard query jobs to have byte limits.
Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.
int
-1
Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.
This property indicates whether or not to include pseudo columns as columns to the table.
string
""
This setting is particularly helpful in Entity Framework, which does not allow you to set a value for a pseudo column unless it is a table column. The value of this connection setting is of the format "Table1=Column1, Table1=Column2, Table2=Column3". You can use the "*" character to include all tables and all columns; for example, "*=*".
This determines what percent of a table is sampled with the TABLESAMPLE operator.
string
""
This option can be set to make the Cloud use the TABLESAMPLE for each
table referenced by a query. The value determines what percent is provided to the
PERCENT clause. That clause will only be generated if this property's value is above
zero.
-- Input SQL SELECT * FROM `tbl` -- Generated Google BigQuery SQL when TableSamplePercent=10 SELECT * FROM `tbl` TABLESAMPLE SYSTEM (10 PERCENT)
This option is subject to a few limitations:
The value in seconds until the timeout error is thrown, canceling the operation.
string
"300"
If Timeout = 0, operations do not time out. The operations run until they complete successfully or until they encounter an error condition.
If Timeout expires and the operation is not yet complete, the Cloud throws an exception.
protobuf
Copyright 2008 Google Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Google Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Code generated by the Protocol Buffer compiler is owned by the owner of the input file used when generating it. This code is not standalone and requires a support library to be linked with it. This support library is itself covered by the above license.