Google BigQuery Connector for CData Sync

Build 23.0.8839
  • Google BigQuery
    • Establishing a Connection
      • Advanced Integrations
      • Minimum Required Roles
    • Advanced Features
      • SSL Configuration
      • Firewall and Proxy
    • Data Model
      • Tables
      • Views
        • Datasets
        • PartitionsList
        • PartitionsValues
        • Projects
      • Data Type Mapping
    • Connection String Options
      • Authentication
        • AuthScheme
        • ProjectId
        • DatasetId
      • BigQuery
        • AllowLargeResultSets
        • UseQueryCache
        • PageSize
        • PollingInterval
        • AllowUpdatesWithoutKey
        • FilterColumns
        • UseLegacySQL
      • Storage API
        • UseStorageAPI
        • UseArrowFormat
        • StorageThreshold
        • StoragePageSize
      • Uploading
        • InsertMode
        • WaitForBatchResults
        • GCSBucket
        • GCSBucketFolder
        • TempTableDataset
      • OAuth
        • OAuthClientId
        • OAuthClientSecret
      • JWT OAuth
        • OAuthJWTCert
        • OAuthJWTCertType
        • OAuthJWTCertPassword
        • OAuthJWTCertSubject
        • OAuthJWTIssuer
        • OAuthJWTSubject
      • SSL
        • SSLServerCert
      • Firewall
        • FirewallType
        • FirewallServer
        • FirewallPort
        • FirewallUser
        • FirewallPassword
      • Proxy
        • ProxyAutoDetect
        • ProxyServer
        • ProxyPort
        • ProxyAuthScheme
        • ProxyUser
        • ProxyPassword
        • ProxySSLType
        • ProxyExceptions
      • Logging
        • LogModules
      • Schema
        • Location
        • BrowsableSchemas
        • Tables
        • Views
        • RefreshViewSchemas
        • ShowTableDescriptions
        • PrimaryKeyIdentifiers
        • AllowedTableTypes
        • FlattenObjects
      • Miscellaneous
        • StorageTimeout
        • AllowAggregateParameters
        • ApplicationName
        • AuditLimit
        • AuditMode
        • BigQueryOptions
        • GenerateSchemaFiles
        • MaximumBillingTier
        • MaximumBytesBilled
        • MaxRows
        • Other
        • PseudoColumns
        • QueryPassthrough
        • TableSamplePercent
        • Timeout
        • UserDefinedViews
    • Third Party Copyrights

Google BigQuery Connector for CData Sync

Overview

The CData Sync App provides a straightforward way to continuously pipeline your Google BigQuery data to any database, data lake, or data warehouse, making it easily available for Analytics, Reporting, AI, and Machine Learning.

The Google BigQuery connector can be used from the CData Sync application to pull data from Google BigQuery and move it to any of the supported destinations.

Google BigQuery Version Support

The Sync App enables read/write SQL-92 access to the BigQuery tables in your Google account or Google Apps domain. The complete aggregate and join syntax in BigQuery is supported. Additionally, statements in the BigQuery syntax can be passed through. The Sync App uses version 2.0 of the BigQuery Web services API: You must enable this API by creating a project in the Google Developers Console. See Connecting to Google for a guide to creating a project and authenticating to this API.

Google BigQuery Connector for CData Sync

Establishing a Connection

Adding a Connection to Google BigQuery

To add a connection to Google BigQuery:

  1. In the application console, navigate to the Connections page.
  2. At the Add Connections panel, select the icon for the connection you want to add.
  3. If the Google BigQuery icon is not available, click the Add More icon to download and install the Google BigQuery connector from the CData site.

For required properties, see the Settings tab.

For connection properties that are not typically required, see the Advanced tab.

Connecting to Google BigQuery

By default, the CData Sync App connects to all available projects in your database. To limit the scope of your connection, set combinations of the following properties:

  • ProjectId: specifies which projects the driver connects to
  • BillingProjectId: specifies which projects are billed
  • DatasetId: specifies which datasets the driver accesses

Authenticating to Google BigQuery

The Sync App supports using user accounts and GCP instance accounts for authentication.

The following sections discuss the available authentication schemes for Google BigQuery:

  • User Accounts (OAuth)
  • Service Account (OAuthJWT)
  • GCP Instance Account

User Accounts (OAuth)

AuthScheme must be set to OAuth in all user account flows.

Web Applications

When connecting via a Web application, you need to create and register a custom OAuth application with Google BigQuery. You can then use the Sync App to acquire and manage the OAuth token values. See Creating a Custom OAuth App for more information about custom applications.

Get an OAuth Access Token

Set the following connection properties to obtain the OAuthAccessToken:

  • OAuthClientId: Set this to the Client Id in your application settings.
  • OAuthClientSecret: Set this to the Client Secret in your application settings.

Then call stored procedures to complete the OAuth exchange:

  1. Call the GetOAuthAuthorizationURL stored procedure. Set the CallbackURL input to the Callback URL you specified in your application settings. The stored procedure returns the URL to the OAuth endpoint.
  2. Navigate to the URL that the stored procedure returned in Step 1. Log in to the custom OAuth application and authorize the web application. Once authenticated, the browser redirects you to the callback URL.
  3. Call the GetOAuthAccessToken stored procedure. Set AuthMode to WEB and the Verifier input to the "code" parameter in the query string of the callback URL.

Once you have obtained the access and refresh tokens, you can connect to data and refresh the OAuth access token either automatically or manually.

Automatic Refresh of the OAuth Access Token

To have the driver automatically refresh the OAuth access token, set the following on the first data connection:

  • InitiateOAuth: Set this to REFRESH.
  • OAuthClientId: Set this to the Client Id in your application settings.
  • OAuthClientSecret: Set this to the Client Secret in your application settings.
  • OAuthAccessToken: Set this to the access token returned by GetOAuthAccessToken.
  • OAuthRefreshToken: Set this to the refresh token returned by GetOAuthAccessToken.
  • OAuthSettingsLocation: Set this to the location where the Sync App saves the OAuth token values, which persist across connections.
On subsequent data connections, the values for OAuthAccessToken and OAuthRefreshToken are taken from OAuthSettingsLocation.

Manual Refresh of the OAuth Access Token

The only value needed to manually refresh the OAuth access token when connecting to data is the OAuth refresh token.

Use the RefreshOAuthAccessToken stored procedure to manually refresh the OAuthAccessToken after the ExpiresIn parameter value returned by GetOAuthAccessToken has elapsed, then set the following connection properties:

  • OAuthClientId: Set this to the Client Id in your application settings.
  • OAuthClientSecret: Set this to the Client Secret in your application settings.

Then call RefreshOAuthAccessToken with OAuthRefreshToken set to the OAuth refresh token returned by GetOAuthAccessToken. After the new tokens have been retrieved, open a new connection by setting the OAuthAccessToken property to the value returned by RefreshOAuthAccessToken.

Finally, store the OAuth refresh token so that you can use it to manually refresh the OAuth access token after it has expired.

Headless Machines

To configure the driver to use OAuth with a user account on a headless machine, you need to authenticate on another device that has an internet browser.

  1. Choose one of two options:
    • Option 1: Obtain the OAuthVerifier value as described in "Obtain and Exchange a Verifier Code" below.
    • Option 2: Install the Sync App on a machine with an Internet browser and transfer the OAuth authentication values after you authenticate through the usual browser-based flow, as described in "Transfer OAuth Settings" below.
  2. Then configure the Sync App to automatically refresh the access token on the headless machine.

Option 1: Obtain and Exchange a Verifier Code

To obtain a verifier code, you must authenticate at the OAuth authorization URL.

Follow the steps below to authenticate from the machine with an Internet browser and obtain the OAuthVerifier connection property.

  1. Choose one of these options:
    • If you are using the Embedded OAuth Application click Google BigQuery OAuth endpoint to open the endpoint in your browser.
    • If you are using a custom OAuth application, create the Authorization URL by setting the following properties:
      • InitiateOAuth: Set to OFF.
      • OAuthClientId: Set to the client Id assigned when you registered your application.
      • OAuthClientSecret: Set to the client secret assigned when you registered your application.
      Then call the GetOAuthAuthorizationURL stored procedure with the appropriate CallbackURL. Open the URL returned by the stored procedure in a browser.
  2. Log in and grant permissions to the Sync App. You are then redirected to the callback URL, which contains the verifier code.
  3. Save the value of the verifier code. Later you will set this in the OAuthVerifier connection property.
Next, you need to exchange the OAuth verifier code for OAuth refresh and access tokens. Set the following properties:

On the headless machine, set the following connection properties to obtain the OAuth authentication values:

  • InitiateOAuth: Set this to REFRESH.
  • OAuthVerifier: Set this to the verifier code.
  • OAuthClientId: (custom applications only) Set this to the Client Id in your custom OAuth application settings.
  • OAuthClientSecret: (custom applications only) Set this to the Client Secret in the custom OAuth application settings.
  • OAuthSettingsLocation: Set this to persist the encrypted OAuth authentication values to the specified location.

After the OAuth settings file is generated, you need to re-set the following properties to connect:

  • InitiateOAuth: Set this to REFRESH.
  • OAuthClientId: (custom applications only) Set this to the client Id assigned when you registered your application.
  • OAuthClientSecret: (custom applications only) Set this to the client secret assigned when you registered your application.
  • OAuthSettingsLocation: Set this to the location containing the encrypted OAuth authentication values. Make sure this location gives read and write permissions to the Sync App to enable the automatic refreshing of the access token.

Option 2: Transfer OAuth Settings

Prior to connecting on a headless machine, you need to create and install a connection with the driver on a device that supports an Internet browser. Set the connection properties as described in "Desktop Applications" above.

After completing the instructions in "Desktop Applications", the resulting authentication values are encrypted and written to the location specified by OAuthSettingsLocation. The default filename is OAuthSettings.txt.

Once you have successfully tested the connection, copy the OAuth settings file to your headless machine.

On the headless machine, set the following connection properties to connect to data:

  • InitiateOAuth: Set this to REFRESH.
  • OAuthClientId: (custom applications only) Set this to the client Id assigned when you registered your application.
  • OAuthClientSecret: (custom applications only) Set this to the client secret assigned when you registered your application.
  • OAuthSettingsLocation: Set this to the location of your OAuth settings file. Make sure this location gives read and write permissions to the Sync App to enable the automatic refreshing of the access token.

GCP Instance Accounts

When running on a GCP virtual machine, the Sync App can authenticate using a service account tied to the virtual machine. To use this mode, set AuthScheme to GCPInstanceAccount.

Google BigQuery Connector for CData Sync

Advanced Integrations

The following sections detail Sync App settings that may be needed in advanced integrations.

Saving Result Sets

Large result sets must be saved in a temporary or permanent table. You can use the following properties to control table persistence:

Automatic Result Tables

Enable the AllowLargeResultSets property to make the Sync App automatically create destination tables when needed. If a query result is too large to fit the BigQuery query cache, the Sync App creates a hidden dataset within the data project and re-executes the query with a destination table in that dataset. The dataset is configured so that all tables created within it expire in 24 hours.

In some situations you may want to change the name of the dataset created by the Sync App. For example, if multiple users are using the Sync App and do not have permissions to write to datasets created by the other users. See TempTableDataset for details on how to do this.

Limiting Billing

Set MaximumBillingTier to override your project limits on the maximum cost for any given query in a connection.

Bulk Modes

Google BigQuery provides several interfaces for operating on batches of rows. The Sync App supports these methods through the InsertMode option, each of which are specialized to different use cases:

  • The Streaming API is intended for use where the most important factor is being able to insert quickly. However, rows which are inserted via the API are queued and only appear in the table after a delay. Sometimes this delay can be as high as 20-30 minutes which makes this API incompatible with cases where you want to insert data and then run other operations on it immediately. You should avoid modifying the table while any rows are in the streaming queue: Google BigQuery prevents DML operations from running on the table while any rows are in the streaming queue, and changing the table's metadata (name, schema, etc.) may cause streamed rows that haven't been committed to be lost.
  • The DML mode API uses Standard SQL INSERT queries to upload data. This is by the most robust method of uploading data because any errors in the uploaded rows will be reported immediately. The Sync App also uses this API in a synchronous way so once the INSERT is processed, any rows can be used by other operations without waiting. However, it is by far the slowest insert method and should only be used for small data volumes.
  • The Upload mode uses the multipart upload API for uploading data. This method is intended for performing low-cost medium to large data loads within a reasonable time. When using this mode the Sync App will upload the inserted rows to Google-managed storage and then create a load job for them. This job will execute and the Sync App can either wait for it (see WaitForBatchResults) or let it run asyncronously. Waiting for the job will report any errors that the job enconters but will take more time. Determining if the job failed without waiting for it requires manually checking the job status via the job stored procedures.
  • The GCSStaging mode is the same as Upload except that it uses your Google Cloud Storage acccount to store staged data instead of Google-managed storage. The Sync App cannot act asynchronously in this mode because it must delete the file after the load is complete, which means that WaitForBatchResults has no effect.
    Because this depends on external data, you must set the GCSBucket to the name of your bucket and ensure that Scope (a space delimited set of scopes) contains at least the scopes https://www.googleapis.com/auth/bigquery and https://www.googleapis.com/auth/devstorage.read_write. The devstorage scope used for GCS also requires that you connect using a service account because Google BigQuery does not allow user accounts to use this scope.

In addition to bulk INSERTs, the Sync App also supports performing bulk UPDATE and DELETE operations. This requires the Sync App to upload the data containing the filters and rows to set into a new table in BigQuery, then perform a MERGE between the two tables and drop the temporary table. InsertMode determines how the rows are inserted into the temporary table but the Streaming and DML modes are not supported.

In most cases the Sync App can determine what columns need to be part of the SET vs. WHERE clauses of a bulk update. If you receive an error like "Primary keys must be defined for bulk UPDATE support," you can use PrimaryKeyIdentifiers to tell the Sync App what columns to treat as keys. In an update the values of key columns are used only to find matching rows and cannot be updated.

Google BigQuery Connector for CData Sync

Minimum Required Roles

Minimum Required Roles for Service Accounts

The following roles allow SELECT queries to work with a service account:

  • BigQuery Data Viewer (roles/bigquery.dataViewer): read data and metadata
  • BigQuery Filtered Data Viewer (roles/bigquery.filteredDataViewer): view filtered table data
  • BigQuery Job User (roles/bigquery.jobUser): run jobs, including queries, within the project

Google BigQuery Connector for CData Sync

Advanced Features

This section details a selection of advanced features of the Google BigQuery Sync App.

User Defined Views

The Sync App allows you to define virtual tables, called user defined views, whose contents are decided by a pre-configured query. These views are useful when you cannot directly control queries being issued to the drivers. See User Defined Views for an overview of creating and configuring custom views.

SSL Configuration

Use SSL Configuration to adjust how Sync App handles TLS/SSL certificate negotiations. You can choose from various certificate formats; see the SSLServerCert property under "Connection String Options" for more information.

Firewall and Proxy

Configure the Sync App for compliance with Firewall and Proxy, including Windows proxies and HTTP proxies. You can also set up tunnel connections.

Query Processing

The Sync App offloads as much of the SELECT statement processing as possible to Google BigQuery and then processes the rest of the query in memory (client-side).

See Query Processing for more information.

Logging

See Logging for an overview of configuration settings that can be used to refine CData logging. For basic logging, you only need to set two connection properties, but there are numerous features that support more refined logging, where you can select subsets of information to be logged using the LogModules connection property.

Google BigQuery Connector for CData Sync

SSL Configuration

Customizing the SSL Configuration

By default, the Sync App attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store.

To specify another certificate, see the SSLServerCert property for the available formats to do so.

Google BigQuery Connector for CData Sync

Firewall and Proxy

Connecting Through a Firewall or Proxy

HTTP Proxies

To connect through the Windows system proxy, you do not need to set any additional connection properties. To connect to other proxies, set ProxyAutoDetect to false.

In addition, to authenticate to an HTTP proxy, set ProxyAuthScheme, ProxyUser, and ProxyPassword, in addition to ProxyServer and ProxyPort.

Other Proxies

Set the following properties:

  • To use a proxy-based firewall, set FirewallType, FirewallServer, and FirewallPort.
  • To tunnel the connection, set FirewallType to TUNNEL.
  • To authenticate, specify FirewallUser and FirewallPassword.
  • To authenticate to a SOCKS proxy, additionally set FirewallType to SOCKS5.

Google BigQuery Connector for CData Sync

Data Model

The CData Sync App models the data as defined within Google BigQuery for the ProjectId and DatasetId configured.

Views

Views are client-side tables that cannot be modified. The Sync App uses these to report metadata about the Google BigQuery projects and datsets it is connected to.

In addition, the Sync App supports server-side views defined within Google BigQuery. These views may be used in SELECT statements the same way as tables. However, view schemas can easily become out of date and require the Sync App to refresh them. Please see RefreshViewSchemas for more details.

External Data Sources

Google BigQuery allows creating external datasets that store data in Amazon S3 regions (like aws-us-east-1) or Azure Storage regions (like azure-useast2). The Sync App supports these datasets with two major limitations:

  1. Google BigQuery treats external tables as read-only. You cannot execute INSERT, UPDATE or DELETE queries on them.
  2. Google BigQuery does not support the Storage API for external datasets. You must disable the UseStorageApi option in order to query them. This limits the read throughput of the Sync App, so if you are executing large queries it is recommended that you copy your data into Google BigQuery for the best performance.

Stored Procedures

Stored Procedures are function-like interfaces to the data source. The Sync App uses these to manage Google BigQuery tables and jobs and to perform OAuth operations.

In addition to the client-side stored procedures offered by the Sync App, there is also support for server-side stored procedures defined in Google BigQuery. The Sync App supports both CALL and EXEC using the procedure's parameter names. Note that Sync App only supports IN parameters and resultset return values.

CALL `psychic-valve-137816`.Northwind.MostPopularProduct()
CALL `psychic-valve-137816`.Northwind.GetStockedValue(24, 0.75)

EXEC `psychic-valve-137816`.Northwind.MostPopularProduct
EXEC `psychic-valve-137816`.Northwind.GetSockedValue productId = 24, discountRate = 0.75

Additional Metadata

Table Descriptions

Google BigQuery supports setting descriptions on tables but the Sync App does not report these by default. ShowTableDescriptions can be used to report table descriptions.

Primary Keys

Google BigQuery does not support primary keys natively, but the Sync App allows you to define them so they can be used in environments that require primary keys to modify data. Primary keys can be defined using the PrimaryKeyIdentifiers option.

Policy Tags

If policy tags from the Data Catalog service are defined on a table, they can be retrieved from the system tables using the PolicyTags column:

SELECT ColumnName, PolicyTags FROM sys_tablecolumns
WHERE CatalogName = 'psychic-valve-137816'
AND SchemaName = 'Northwind'
AND TableName = 'Customers

Google BigQuery Connector for CData Sync

Tables

Tables

Table definitions are dynamically generated based on the table definitions within Google BigQuery for the Project and Dataset specified in the connection string options.

Google BigQuery Connector for CData Sync

Views

Views are similar to tables in the way that data is represented; however, views are read-only.

Queries can be executed against a view as if it were a normal table.

Google BigQuery Connector for CData Sync Views

Name Description
Datasets Lists all the accessible datasets for a given project.
PartitionsList Lists the partitioning definitions for tables
PartitionsValues Lists the partitioning ranges for tables
Projects Lists all the projects for the authorized user.

Google BigQuery Connector for CData Sync

Datasets

Lists all the accessible datasets for a given project.

Columns

Name Type Description
Id [KEY] String The fully qualified, unique, opaque Id of the dataset.
Kind String The resource type.
FriendlyName String A descriptive name for the dataset
DatasetReference_ProjectId String A unique reference to the container project.
DatasetReference_DatasetId String A unique reference to the dataset, without the project name.

Google BigQuery Connector for CData Sync

PartitionsList

Lists the partitioning definitions for tables

Columns

Name Type Description
Id [KEY] String A unique identifier for the partition.
ProjectId String The project that the table belongs to.
DatasetId String The dataset that the table belongs to.
TableName String The name of the table.
ColumnName String The name of the column used for partitioning.
ColumnType String The type of the partitioning column.
Kind String The type of partitioning used by the table. One of DATE, RANGE or INGESTION.
RequireFilter Boolean Whether a filter on the partition column is required to query the table.

Google BigQuery Connector for CData Sync

PartitionsValues

Lists the partitioning ranges for tables

Columns

Name Type Description
Id String A unique identifier for the partition.
RangeLow String The lowest value of the partition column. Either an integer when Kind is RANGE, or a date otherwise.
RangeHigh String The highest value of the partition column. Either an integer when Kind is RANGE, or a date otherwise.
RangeInterval String The range of values which are included in each partition. Only valid when Kind is RANGE
DateResolution String How much of the date is significant to a TIME or INGESTION partition column. One of DAY, HOUR, MONTH or YEAR.

Google BigQuery Connector for CData Sync

Projects

Lists all the projects for the authorized user.

Columns

Name Type Description
Id [KEY] String The unique identifier of the Project
Kind String The resource type.
FriendlyName String A descriptive name for the project.
NumericId String The numeric Id of the project.
ProjectReference_ProjectId String A unique reference to the project.

Google BigQuery Connector for CData Sync

Data Type Mapping

Data Type Mappings

The Sync App maps types from the data source to the corresponding data type available in the schema. The table below documents these mappings.

Google BigQuery CData Schema
STRING string
BYTES binary
INTEGER long
FLOAT double
NUMERIC decimal
BIGNUMERIC decimal
BOOLEAN bool
DATE date
TIME time
DATETIME datetime
TIMESTAMP datetime
STRUCT See below
ARRAY See below
GEOGRAPHY string
JSON string
INTERVAL string

Note that the NUMERIC type supports 38 digits of precision and the BIGDECIMAL type supports 76 digits of precision. Most platforms do not have a decimal type that supports the full precision of these values (.NET decimal supports 28 digits, and Java BigDecimal supports 38 by default). If this is the case, then you can cast these columns to a string when queried, or the connection can be configured to ignore them by setting IgnoreTypes=decimal.

STRUCT and ARRAY Types

Google BigQuery supports two kinds of types for storing compound values in a single row, STRUCT and ARRAY. In some places within Google BigQuery these are also known as RECORD and REPEATED types.

A STRUCT is a fixed-size group of values that are accessed by name and can have different types. The Sync App flattens structs so their individual fields can be accessed using dotted names. Note that these dotted names must be quoted.

-- trade_value STRUCT<currency STRING, value FLOAT>
SELECT CONCAT([trade_value.value], ' ', NULLIF([trade_value.currency], 'USD'))
FROM trades

An ARRAY is a group of values with the same type that can have any size. The Sync App treats the array as a single compound value and reports it as a JSON aggregate.

These types may be combined such that a STRUCT type contains an ARRAY field, or an ARRAY field is a list of STRUCT values. The outer type takes precedence in how the field is processed:

/* Table contains fields: 
  stocks STRUCT<symbol STRING, prices ARRAY<FLOAT>>
  offers: ARRAY<STRUCT<currency STRING, value FLOAT>> 
*/

SELECT [stocks.symbol], /* ARRAY field can be read from STRUCT, but is converted to JSON */
       [stocks.prices], 
       [offers]         /* STRUCT fields in an ARRAY cannot be accessed */
FROM market

INTERVAL Types

The Sync App represents INTERVAL types as strings. Whenever a query requires an INTERVAL type, it must specify the INTERVAL using the BigQuery SQL INTERVAL format:

YEAR-MONTH DAY HOUR:MINUTE:SECOND.FRACTION
. All queries that return INTERVAL values use this format unless they appear in an ARRAY aggregate, where the format depends upon how the Sync App reads the data.

For example, the value "5 years and 11 months, minus 10 days and 3 hours and 2.5 seconds" in the correct format is:

5-11 -10 -3:0:0.2.5

Type Parameters

The Sync App exposes parameters on the following types. In each case the type parameters are optional, Google BigQuery has default values for types that are not parameterized.

  • STRING(length)
  • BYTES(length)
  • NUMERIC(precision) or NUMERIC(precision, scale)
  • BIGNUMERIC(precision) or BIGNUMERIC(precision, scale)

These parameters are primarily for restricting the data written to the table. They are included in the table metadata as the column size for STRING and BYTES, and the numeric precision and scale for NUMERIC and BIGNUMERIC.

Type parameters have no effect on queries and are not reported within query metadata. For example, in the example below the output of CONCAT is a plain STRING even though its inputs are a STRING(100) and b STRING(100).

SELECT CONCAT(a, b) FROM table_with_length_params

Google BigQuery Connector for CData Sync

Connection String Options

The connection string properties are the various options that can be used to establish a connection. This section provides a complete list of the options you can configure in the connection string for this provider. Click the links for further details.

For more information on establishing a connection, see Establishing a Connection.

Authentication


PropertyDescription
AuthSchemeThe type of authentication to use when connecting to Google BigQuery.
ProjectIdThe ProjectId used to resolve unqualified tables and execute jobs.
DatasetIdThe DatasetId used to resolve unqualified tables.

BigQuery


PropertyDescription
AllowLargeResultSetsWhether or not to allow large datasets to be stored in temporary tables for large datasets.
UseQueryCacheSpecifies whether to use Google BigQuery's built-in query cache.
PageSizeThe number of results to return per page from Google BigQuery.
PollingIntervalThis determines how long to wait in seconds, between checks to see if a job has completed.
AllowUpdatesWithoutKeyWhether or not to allow update without primary keys.
FilterColumnsPlease set `AllowUpdatesWithoutKey` to true before you could use this property.
UseLegacySQLSpecifies whether to use BigQuery's legacy SQL dialect for this query. By default, Standard SQL will be used.

Storage API


PropertyDescription
UseStorageAPISpecifies whether to use BigQuery's Storage API for bulk data reads.
UseArrowFormatSpecifies whether to use the Arrow format with BigQuery's Storage API.
StorageThresholdThe minimum number of rows a query must return to invoke the Storage API.
StoragePageSizeSpecifies the page size to use for Storage API queries.

Uploading


PropertyDescription
InsertModeSpecifies what kind of method to use when inserting data. By default streaming INSERTs are used.
WaitForBatchResultsWhether to wait for the job to complete when using the bulk upload API. Only active when InsertMode is set to Upload.
GCSBucketSpecifies the name of a GCS bucket to upload bulk data for staging.
GCSBucketFolderSpecifies the name of the folder in GCSBucket to upload bulk data for staging.
TempTableDatasetThe prefix of the dataset that will contain temporary tables when performing bulk UPDATE or DELETE operations.

OAuth


PropertyDescription
OAuthClientIdThe client Id assigned when you register your application with an OAuth authorization server.
OAuthClientSecretThe client secret assigned when you register your application with an OAuth authorization server.

JWT OAuth


PropertyDescription
OAuthJWTCertThe JWT Certificate store.
OAuthJWTCertTypeThe type of key store containing the JWT Certificate.
OAuthJWTCertPasswordThe password for the OAuth JWT certificate.
OAuthJWTCertSubjectThe subject of the OAuth JWT certificate.
OAuthJWTIssuerThe issuer of the Java Web Token.
OAuthJWTSubjectThe user subject for which the application is requesting delegated access.

SSL


PropertyDescription
SSLServerCertThe certificate to be accepted from the server when connecting using TLS/SSL.

Firewall


PropertyDescription
FirewallTypeThe protocol used by a proxy-based firewall.
FirewallServerThe name or IP address of a proxy-based firewall.
FirewallPortThe TCP port for a proxy-based firewall.
FirewallUserThe user name to use to authenticate with a proxy-based firewall.
FirewallPasswordA password used to authenticate to a proxy-based firewall.

Proxy


PropertyDescription
ProxyAutoDetectThis indicates whether to use the system proxy settings or not.
ProxyServerThe hostname or IP address of a proxy to route HTTP traffic through.
ProxyPortThe TCP port the ProxyServer proxy is running on.
ProxyAuthSchemeThe authentication type to use to authenticate to the ProxyServer proxy.
ProxyUserA user name to be used to authenticate to the ProxyServer proxy.
ProxyPasswordA password to be used to authenticate to the ProxyServer proxy.
ProxySSLTypeThe SSL type to use when connecting to the ProxyServer proxy.
ProxyExceptionsA semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer .

Logging


PropertyDescription
LogModulesCore modules to be included in the log file.

Schema


PropertyDescription
LocationA path to the directory that contains the schema files defining tables, views, and stored procedures.
BrowsableSchemasThis property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC.
TablesThis property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC.
ViewsRestricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC.
RefreshViewSchemasAllows the provider to determine up-to-date view schemas automatically.
ShowTableDescriptionsControls whether table descriptions are returned via the platform metadata APIs and sys_tables / sys_views.
PrimaryKeyIdentifiersSet this property to define primary keys.
AllowedTableTypesSpecifies what kinds of tables will be visible.
FlattenObjectsDetermines whether the provider flattens STRUCT fields into top-level columns.

Miscellaneous


PropertyDescription
StorageTimeoutHow long a Storage API connection must remain idle before the provider reconnects.
AllowAggregateParametersAllows raw aggregates to be used in parameters when QueryPassthrough is enabled.
ApplicationNameAn application name in the form application/version. For example, AcmeReporting/1.0.
AuditLimitThe maximum number of rows which will be stored within an audit table.
AuditModeWhat provider actions should be recorded to audit tables.
BigQueryOptionsA comma separated list of Google BigQuery options.
GenerateSchemaFilesIndicates the user preference as to when schemas should be generated and saved.
MaximumBillingTierThe MaximumBillingTier is a positive integer that serves as a multiplier of the basic price per TB. For example, if you set MaximumBillingTier to 2, the maximum cost for that query will be 2x basic price per TB.
MaximumBytesBilledLimits how many bytes BigQuery will allow a job to consume before it is cancelled.
MaxRowsLimits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.
OtherThese hidden properties are used only in specific use cases.
PseudoColumnsThis property indicates whether or not to include pseudo columns as columns to the table.
QueryPassthroughThis option passes the query to the Google BigQuery server as is.
TableSamplePercentThis determines what percent of a table is sampled with the TABLESAMPLE operator.
TimeoutThe value in seconds until the timeout error is thrown, canceling the operation.
UserDefinedViewsA filepath pointing to the JSON configuration file containing your custom views.
Google BigQuery Connector for CData Sync

Authentication

This section provides a complete list of the Authentication properties you can configure in the connection string for this provider.


PropertyDescription
AuthSchemeThe type of authentication to use when connecting to Google BigQuery.
ProjectIdThe ProjectId used to resolve unqualified tables and execute jobs.
DatasetIdThe DatasetId used to resolve unqualified tables.
Google BigQuery Connector for CData Sync

AuthScheme

The type of authentication to use when connecting to Google BigQuery.

Remarks

  • Auto: Lets the driver decide automatically based on the other connection properties you have set.
  • OAuth: Set this to perform OAuth authentication using a standard user account.
  • OAuthJWT: Set this to perform OAuth authentication using an OAuth service account.
  • GCPInstanceAccount: Set this to get Access Token from Google Cloud Platform instance.

Google BigQuery Connector for CData Sync

ProjectId

The ProjectId used to resolve unqualified tables and execute jobs.

Remarks

This property and BillingProjectId are used to determine billing for jobs and resolve unqualified table names.

Job Execution

The Sync App must create a job within Google BigQuery to execute certain kinds of queries. For example, complex SELECT statements, UPDATE and DELETE statements, and INSERT statements (when InsertMode is DML) are all executed using jobs. The project where a job executes determines how the job is billed.

The Sync App determines the billing project using these rules. Note that only the first two rules apply when QueryPassthrough is enabled. Either this property or BillingProjectId must be set to execute passthrough queries.

  1. The BillingProjectId is used if that property is not empty.
  2. Then this property is used.
  3. If both properties are empty, the project is determined from the catalog of the first table in the query. The job created for the following query executes in the psychic-valve-137816 project.

SELECT FirstName, LastName FROM `psychic-valve-137816`.`Northwind`.`customers`

Table Resolution

In addition to setting the billing project, the Sync App also uses this property to determine the default data project. The data project is used to resolve tables included in queries when they are not fully qualified:

/* Unqualified, resolved against connection properties */
SELECT FirstName, LastName FROM `Northwind`.`customers`

/* Qualified, project specified as catalog */
SELECT FirstName, LastName FROM `psychic-valve-137816`.`Northwind`.`customers`

Any unqualified table references in the query are resolved using the following rules. Note that only methods 1 and 2 are supported when QueryPassthrough is enabled. This means that any tables outside the default data project must be explicitly qualified.

  1. This property is used if it is not empty.
  2. Then the BillingProjectId property is used.
  3. If both properties are empty, the catalog from the first table in the query is used. In the following query the `Northwind`.`orders` table is treated as if it comes from the psychic-valve-137186 project.

SELECT ... FROM `psychic-valve-137816`.`Northwind`.`customers`
INNER JOIN `Northwind`.`orders`
ON ...

Google BigQuery Connector for CData Sync

DatasetId

The DatasetId used to resolve unqualified tables.

Remarks

When a query refers to a table it can leave the dataset implicit, or qualify the dataset directly as the schema portion of the table:

/* Implicit, resolved against connection string */
SELECT FirstName, LastName FROM `customers`

/* Explicit, dataset specified as schema */
SELECT FirstName, LastName FROM `psychic-valve-137816`.`Northwind`.`customers`

Any unqualified table references in the query are resolved using the following rules. Note that only method 1 is supported when QueryPassthrough is enabled. This means that passthrough queries must set this property or qualify all tables.

  1. If this property is set then the specified dataset is used.
  2. Otherwise the schema from the first table in the query is used. In the following query the `orders` table is treated as if it comes from the Northwind dataset.

SELECT ... FROM `psychic-valve-137816`.`Northwind`.`customers`
INNER JOIN `orders`
ON ...

Google BigQuery Connector for CData Sync

BigQuery

This section provides a complete list of the BigQuery properties you can configure in the connection string for this provider.


PropertyDescription
AllowLargeResultSetsWhether or not to allow large datasets to be stored in temporary tables for large datasets.
UseQueryCacheSpecifies whether to use Google BigQuery's built-in query cache.
PageSizeThe number of results to return per page from Google BigQuery.
PollingIntervalThis determines how long to wait in seconds, between checks to see if a job has completed.
AllowUpdatesWithoutKeyWhether or not to allow update without primary keys.
FilterColumnsPlease set `AllowUpdatesWithoutKey` to true before you could use this property.
UseLegacySQLSpecifies whether to use BigQuery's legacy SQL dialect for this query. By default, Standard SQL will be used.
Google BigQuery Connector for CData Sync

AllowLargeResultSets

Whether or not to allow large datasets to be stored in temporary tables for large datasets.

Remarks

Whether or not to allow large datasets to be stored in temporary tables for large datasets.

Google BigQuery Connector for CData Sync

UseQueryCache

Specifies whether to use Google BigQuery's built-in query cache.

Remarks

Google BigQuery will cache the results of recent queries, and will use this cache for queries by default. Google BigQuery automatically updates the cache when a table is modified, so performance is generally better without any risk of queries returning stale data.

If this is set to false, the query is always run against the table directly.

Google BigQuery Connector for CData Sync

PageSize

The number of results to return per page from Google BigQuery.

Remarks

The pagesize can control the number of results returned per page from Google BigQuery. Setting a higher pagesize will cause more data to come back in a single HTTP request, but may take longer to execute. Setting a smaller pagesize will increase the number of HTTP requests to get all the data, but is generally recommended to ensure timeout exceptions do not occur.

Note that this option does not have an effect if UseStorageApi is enabled and the queries being executed can be executed on the Storage API. See StoragePageSize for more information.

Google BigQuery Connector for CData Sync

PollingInterval

This determines how long to wait in seconds, between checks to see if a job has completed.

Remarks

This only applies to queries which are stored to a table instead of streamed directly to the Sync App. This applies in only three cases:

  • DestinationTable is set.
  • AllowLargeResultSets is true and the query takes longer than Timeout seconds.
  • UseStorageApi is enabled and the query is complex.

This property determines how long to wait between checking whether or not the query's results are ready. Very large resultsets or complex queries may take longer to process, and a low polling interval may result in may unnecessary requests being made to check the query status.

Google BigQuery Connector for CData Sync

AllowUpdatesWithoutKey

Whether or not to allow update without primary keys.

Remarks

Whether or not to allow update without primary keys.

Google BigQuery Connector for CData Sync

FilterColumns

Please set `AllowUpdatesWithoutKey` to true before you could use this property.

Remarks

Remember setting `AllowUpdatesWithoutKey` to true before you could use this property:

Set the property like this:

`filterColumns=col1[,col2[,col3]];`

Google BigQuery Connector for CData Sync

UseLegacySQL

Specifies whether to use BigQuery's legacy SQL dialect for this query. By default, Standard SQL will be used.

Remarks

If set to true, the query will use BigQuery's Legacy SQL dialect to rebuild the query. If set to false, the query will use BigQuery's standard SQL: https://cloud.google.com/bigquery/sql-reference/.
When UseLegacySQL is set to false, the values of AllowLargeResultSets is ignored. The query will be run as if AllowLargeResultSets is true.

Google BigQuery Connector for CData Sync

Storage API

This section provides a complete list of the Storage API properties you can configure in the connection string for this provider.


PropertyDescription
UseStorageAPISpecifies whether to use BigQuery's Storage API for bulk data reads.
UseArrowFormatSpecifies whether to use the Arrow format with BigQuery's Storage API.
StorageThresholdThe minimum number of rows a query must return to invoke the Storage API.
StoragePageSizeSpecifies the page size to use for Storage API queries.
Google BigQuery Connector for CData Sync

UseStorageAPI

Specifies whether to use BigQuery's Storage API for bulk data reads.

Remarks

By default the Sync App will use the Storage API instead of the default REST API. Depending upon the complexity of the query, the Sync App may execute the query in one of two ways:

  • Simple queries that read all columns from only one table, and have no extra clauses except LIMIT, are executed directly within the Storage API.
  • All other queries are executed as a query job which writes to a temporary table. Once the query is complete, the results are read from the temporary table using the Storage API.

The BigQuery Storage API can read data faster and more efficiently than the REST API (accessible by setting this option to false), but is priced differently and requires extra OAuth permissions when using your own OAuth app. It also uses the separate StoragePageSize property instead of PageSize.

The BigQuery REST API requires no extra permissions and uses standard pricing, but is slower than the Storage API.

Google BigQuery Connector for CData Sync

UseArrowFormat

Specifies whether to use the Arrow format with BigQuery's Storage API.

Remarks

This property only has an effect when UseStorageApi is enabled. When performing reads against the Storage API, the Sync App can request data in different formats. By default it uses Avro but enabling this option makes it use Arrow.

This option should be enabled when working with time series data or other datasets that have many date, time, datetime or timestamp fields. For these datasets using Arrow can have noticable improvements over using Avro. Otherwise Avro and Arrow read times are very close and switching between them is unlikely to make a significant difference.

Google BigQuery Connector for CData Sync

StorageThreshold

The minimum number of rows a query must return to invoke the Storage API.

Remarks

When the Sync App receives a query too complex to be run directly in the Storage API, it creates a query job and uses the Storage API to read from the query results table. If the query job returns fewer than the number of rows provided in this option, then the results are returned directly and the Storage API is not used.

This value should be set between 1 and 100000. Higher values will use the Storage API only for large resultsets, but will be delayed by reading more results from the query job. Lower values will result in smaller delays but will use the Storage API for more queries.

Note that this option only has an effect if UseStorageApi is enabled and the queries being executed cannot be executed directly on the Storage API. Queries which run directly on Storage never create query jobs.

Google BigQuery Connector for CData Sync

StoragePageSize

Specifies the page size to use for Storage API queries.

Remarks

When UseStorageApi is enabled and the query being executed can be run on the Storage API, this option controls how many rows the Sync App is allowed to buffer on the client.

A higher value will generally make queries faster at the expense of consuming more memory, while lower values will conserve memory but make queries slower.

Google BigQuery Connector for CData Sync

Uploading

This section provides a complete list of the Uploading properties you can configure in the connection string for this provider.


PropertyDescription
InsertModeSpecifies what kind of method to use when inserting data. By default streaming INSERTs are used.
WaitForBatchResultsWhether to wait for the job to complete when using the bulk upload API. Only active when InsertMode is set to Upload.
GCSBucketSpecifies the name of a GCS bucket to upload bulk data for staging.
GCSBucketFolderSpecifies the name of the folder in GCSBucket to upload bulk data for staging.
TempTableDatasetThe prefix of the dataset that will contain temporary tables when performing bulk UPDATE or DELETE operations.
Google BigQuery Connector for CData Sync

InsertMode

Specifies what kind of method to use when inserting data. By default streaming INSERTs are used.

Remarks

This section provides only a summary of the mechanisms that each of these modes use. Please see Advanced Integrations for more details on how to use each of these modes.

  • Streaming uses the Google BigQuery streaming API (also called insertAll).
  • DML uses the Google BigQuery query API to generate INSERT SQL statements which insert individual rows.
  • Upload uses the Google BigQuery upload API to create a load job which copies the rows from temporary server-side storage.
  • GCSStaging is similar to the Upload mode except that it uses your Google Cloud Storage account instead of public storage.

When UseLegacySQL is true only Streaming and Upload modes are allowed. The Legacy SQL dialect does not support DML statements.

Google BigQuery Connector for CData Sync

WaitForBatchResults

Whether to wait for the job to complete when using the bulk upload API. Only active when InsertMode is set to Upload.

Remarks

This property determines whether the Sync App will wait for batch jobs to report their status. By default property is true and INSERT queries will complete only once Google BigQuery has finished executed them. When this property is false the INSERT query will complete as soon as a job is submitted for it.

The default mode is recommended for reliability:

  1. INSERTs will never fail silently. If the Sync App does not wait for the job to finish, it will never receive an error if the job failed to execute.
  2. If the INSERT batch size is small enough, the Sync App may submit jobs quickly enough that it hits Google BigQuery's load job limits. This does not happen when waiting for batch results because the Sync App will not allow more than one job to execute at the same time on the same connection.

You can disable this option to achieve lower delays when inserting, but you must also make sure to obey the Google BigQuery rate limits and check the status of each job to track their status and determine whether they have succeeded or failed.

Google BigQuery Connector for CData Sync

GCSBucket

Specifies the name of a GCS bucket to upload bulk data for staging.

Remarks

Only applies when InsertMode is set to GCSStaging, and if that option is set to use staging then this option is required.

Google BigQuery Connector for CData Sync

GCSBucketFolder

Specifies the name of the folder in GCSBucket to upload bulk data for staging.

Remarks

Only applies when InsertMode is set to GCSStaging. If not set the Sync App defaults to writing to the root of the bucket.

Google BigQuery Connector for CData Sync

TempTableDataset

The prefix of the dataset that will contain temporary tables when performing bulk UPDATE or DELETE operations.

Remarks

Internally bulk UPDATE and DELETE use Google BigQuery MERGE queries, which require creating a table to hold all the update operations. This option is used along with the target table's region to determine the name of the dataset where these temporary tables are created. Each region must have its own temporary dataset so that the temporary table and the MERGE table can be stored in the same project/dataset. This avoids unnecessary data transfer charges.

For example, the Sync App would create a dataset called "_CDataTempTableDataset_US" for tables in the US region and a dataset called "_CDataTempTableDataset_asia_southeast_1" for tables in the Singapore region.

Google BigQuery Connector for CData Sync

OAuth

This section provides a complete list of the OAuth properties you can configure in the connection string for this provider.


PropertyDescription
OAuthClientIdThe client Id assigned when you register your application with an OAuth authorization server.
OAuthClientSecretThe client secret assigned when you register your application with an OAuth authorization server.
Google BigQuery Connector for CData Sync

OAuthClientId

The client Id assigned when you register your application with an OAuth authorization server.

Remarks

As part of registering an OAuth application, you will receive the OAuthClientId value, sometimes also called a consumer key, and a client secret, the OAuthClientSecret.

Google BigQuery Connector for CData Sync

OAuthClientSecret

The client secret assigned when you register your application with an OAuth authorization server.

Remarks

As part of registering an OAuth application, you will receive the OAuthClientId, also called a consumer key. You will also receive a client secret, also called a consumer secret. Set the client secret in the OAuthClientSecret property.

Google BigQuery Connector for CData Sync

JWT OAuth

This section provides a complete list of the JWT OAuth properties you can configure in the connection string for this provider.


PropertyDescription
OAuthJWTCertThe JWT Certificate store.
OAuthJWTCertTypeThe type of key store containing the JWT Certificate.
OAuthJWTCertPasswordThe password for the OAuth JWT certificate.
OAuthJWTCertSubjectThe subject of the OAuth JWT certificate.
OAuthJWTIssuerThe issuer of the Java Web Token.
OAuthJWTSubjectThe user subject for which the application is requesting delegated access.
Google BigQuery Connector for CData Sync

OAuthJWTCert

The JWT Certificate store.

Remarks

The name of the certificate store for the client certificate.

The OAuthJWTCertType field specifies the type of the certificate store specified by OAuthJWTCert. If the store is password protected, specify the password in OAuthJWTCertPassword.

OAuthJWTCert is used in conjunction with the OAuthJWTCertSubject field in order to specify client certificates. If OAuthJWTCert has a value, and OAuthJWTCertSubject is set, a search for a certificate is initiated. Please refer to the OAuthJWTCertSubject field for details.

Designations of certificate stores are platform-dependent.

The following are designations of the most common User and Machine certificate stores in Windows:

MYA certificate store holding personal certificates with their associated private keys.
CACertifying authority certificates.
ROOTRoot certificates.
SPCSoftware publisher certificates.

In Java, the certificate store normally is a file containing certificates and optional private keys.

When the certificate store type is PFXFile, this property must be set to the name of the file. When the type is PFXBlob, the property must be set to the binary contents of a PFX file (i.e. PKCS12 certificate store).

Google BigQuery Connector for CData Sync

OAuthJWTCertType

The type of key store containing the JWT Certificate.

Remarks

This property can take one of the following values:

USERFor Windows, this specifies that the certificate store is a certificate store owned by the current user. Note: This store type is not available in Java.
MACHINEFor Windows, this specifies that the certificate store is a machine store. Note: this store type is not available in Java.
PFXFILEThe certificate store is the name of a PFX (PKCS12) file containing certificates.
PFXBLOBThe certificate store is a string (base-64-encoded) representing a certificate store in PFX (PKCS12) format.
JKSFILEThe certificate store is the name of a Java key store (JKS) file containing certificates. Note: this store type is only available in Java.
JKSBLOBThe certificate store is a string (base-64-encoded) representing a certificate store in Java key store (JKS) format. Note: this store type is only available in Java.
PEMKEY_FILEThe certificate store is the name of a PEM-encoded file that contains a private key and an optional certificate.
PEMKEY_BLOBThe certificate store is a string (base64-encoded) that contains a private key and an optional certificate.
PUBLIC_KEY_FILEThe certificate store is the name of a file that contains a PEM- or DER-encoded public key certificate.
PUBLIC_KEY_BLOBThe certificate store is a string (base-64-encoded) that contains a PEM- or DER-encoded public key certificate.
SSHPUBLIC_KEY_FILEThe certificate store is the name of a file that contains an SSH-style public key.
SSHPUBLIC_KEY_BLOBThe certificate store is a string (base-64-encoded) that contains an SSH-style public key.
P7BFILEThe certificate store is the name of a PKCS7 file containing certificates.
PPKFILEThe certificate store is the name of a file that contains a PPK (PuTTY Private Key).
XMLFILEThe certificate store is the name of a file that contains a certificate in XML format.
XMLBLOBThe certificate store is a string that contains a certificate in XML format.
GOOGLEJSONThe certificate store is the name of a JSON file containing the service account information. Only valid when connecting to a Google service.
GOOGLEJSONBLOBThe certificate store is a string that contains the service account JSON. Only valid when connecting to a Google service.

Google BigQuery Connector for CData Sync

OAuthJWTCertPassword

The password for the OAuth JWT certificate.

Remarks

If the certificate store is of a type that requires a password, this property is used to specify that password in order to open the certificate store.

This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys are not encrypted.

Google BigQuery Connector for CData Sync

OAuthJWTCertSubject

The subject of the OAuth JWT certificate.

Remarks

When loading a certificate the subject is used to locate the certificate in the store.

If an exact match is not found, the store is searched for subjects containing the value of the property.

If a match is still not found, the property is set to an empty string, and no certificate is selected.

The special value "*" picks the first certificate in the certificate store.

The certificate subject is a comma separated list of distinguished name fields and values. For instance "CN=www.server.com, OU=test, C=US, [email protected]". Common fields and their meanings are displayed below.

FieldMeaning
CNCommon Name. This is commonly a host name like www.server.com.
OOrganization
OUOrganizational Unit
LLocality
SState
CCountry
EEmail Address

If a field value contains a comma it must be quoted.

Google BigQuery Connector for CData Sync

OAuthJWTIssuer

The issuer of the Java Web Token.

Remarks

The issuer of the Java Web Token. Enter the value of a delegated user Email Address.

This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys contain a copy of the issuer account.

The issuer of the Java Web Token. Enter the value of a delegated user Email Address.

This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys contain a copy of the issuer account.

Google BigQuery Connector for CData Sync

OAuthJWTSubject

The user subject for which the application is requesting delegated access.

Remarks

The user subject for which the application is requesting delegated access. Enter the value of the Service Account Email.

The user subject for which the application is requesting delegated access. Enter the value of the Service Account Email.

Google BigQuery Connector for CData Sync

SSL

This section provides a complete list of the SSL properties you can configure in the connection string for this provider.


PropertyDescription
SSLServerCertThe certificate to be accepted from the server when connecting using TLS/SSL.
Google BigQuery Connector for CData Sync

SSLServerCert

The certificate to be accepted from the server when connecting using TLS/SSL.

Remarks

If using a TLS/SSL connection, this property can be used to specify the TLS/SSL certificate to be accepted from the server. Any other certificate that is not trusted by the machine is rejected.

This property can take the following forms:

Description Example
A full PEM Certificate (example shortened for brevity) -----BEGIN CERTIFICATE----- MIIChTCCAe4CAQAwDQYJKoZIhv......Qw== -----END CERTIFICATE-----
A path to a local file containing the certificate C:\cert.cer
The public key (example shortened for brevity) -----BEGIN RSA PUBLIC KEY----- MIGfMA0GCSq......AQAB -----END RSA PUBLIC KEY-----
The MD5 Thumbprint (hex values can also be either space or colon separated) ecadbdda5a1529c58a1e9e09828d70e4
The SHA1 Thumbprint (hex values can also be either space or colon separated) 34a929226ae0819f2ec14b4a3d904f801cbb150d

If not specified, any certificate trusted by the machine is accepted.

Use '*' to signify to accept all certificates. Note that this is not recommended due to security concerns.

Google BigQuery Connector for CData Sync

Firewall

This section provides a complete list of the Firewall properties you can configure in the connection string for this provider.


PropertyDescription
FirewallTypeThe protocol used by a proxy-based firewall.
FirewallServerThe name or IP address of a proxy-based firewall.
FirewallPortThe TCP port for a proxy-based firewall.
FirewallUserThe user name to use to authenticate with a proxy-based firewall.
FirewallPasswordA password used to authenticate to a proxy-based firewall.
Google BigQuery Connector for CData Sync

FirewallType

The protocol used by a proxy-based firewall.

Remarks

This property specifies the protocol that the Sync App will use to tunnel traffic through the FirewallServer proxy. Note that by default, the Sync App connects to the system proxy; to disable this behavior and connect to one of the following proxy types, set ProxyAutoDetect to false.

Type Default Port Description
TUNNEL 80 When this is set, the Sync App opens a connection to Google BigQuery and traffic flows back and forth through the proxy.
SOCKS4 1080 When this is set, the Sync App sends data through the SOCKS 4 proxy specified by FirewallServer and FirewallPort and passes the FirewallUser value to the proxy, which determines if the connection request should be granted.
SOCKS5 1080 When this is set, the Sync App sends data through the SOCKS 5 proxy specified by FirewallServer and FirewallPort. If your proxy requires authentication, set FirewallUser and FirewallPassword to credentials the proxy recognizes.

To connect to HTTP proxies, use ProxyServer and ProxyPort. To authenticate to HTTP proxies, use ProxyAuthScheme, ProxyUser, and ProxyPassword.

Google BigQuery Connector for CData Sync

FirewallServer

The name or IP address of a proxy-based firewall.

Remarks

This property specifies the IP address, DNS name, or host name of a proxy allowing traversal of a firewall. The protocol is specified by FirewallType: Use FirewallServer with this property to connect through SOCKS or do tunneling. Use ProxyServer to connect to an HTTP proxy.

Note that the Sync App uses the system proxy by default. To use a different proxy, set ProxyAutoDetect to false.

Google BigQuery Connector for CData Sync

FirewallPort

The TCP port for a proxy-based firewall.

Remarks

This specifies the TCP port for a proxy allowing traversal of a firewall. Use FirewallServer to specify the name or IP address. Specify the protocol with FirewallType.

Google BigQuery Connector for CData Sync

FirewallUser

The user name to use to authenticate with a proxy-based firewall.

Remarks

The FirewallUser and FirewallPassword properties are used to authenticate against the proxy specified in FirewallServer and FirewallPort, following the authentication method specified in FirewallType.

Google BigQuery Connector for CData Sync

FirewallPassword

A password used to authenticate to a proxy-based firewall.

Remarks

This property is passed to the proxy specified by FirewallServer and FirewallPort, following the authentication method specified by FirewallType.

Google BigQuery Connector for CData Sync

Proxy

This section provides a complete list of the Proxy properties you can configure in the connection string for this provider.


PropertyDescription
ProxyAutoDetectThis indicates whether to use the system proxy settings or not.
ProxyServerThe hostname or IP address of a proxy to route HTTP traffic through.
ProxyPortThe TCP port the ProxyServer proxy is running on.
ProxyAuthSchemeThe authentication type to use to authenticate to the ProxyServer proxy.
ProxyUserA user name to be used to authenticate to the ProxyServer proxy.
ProxyPasswordA password to be used to authenticate to the ProxyServer proxy.
ProxySSLTypeThe SSL type to use when connecting to the ProxyServer proxy.
ProxyExceptionsA semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer .
Google BigQuery Connector for CData Sync

ProxyAutoDetect

This indicates whether to use the system proxy settings or not.

Remarks

This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings.

To connect to an HTTP proxy, see ProxyServer. For other proxies, such as SOCKS or tunneling, see FirewallType.

Google BigQuery Connector for CData Sync

ProxyServer

The hostname or IP address of a proxy to route HTTP traffic through.

Remarks

The hostname or IP address of a proxy to route HTTP traffic through. The Sync App can use the HTTP, Windows (NTLM), or Kerberos authentication types to authenticate to an HTTP proxy.

If you need to connect through a SOCKS proxy or tunnel the connection, see FirewallType.

By default, the Sync App uses the system proxy. If you need to use another proxy, set ProxyAutoDetect to false.

Google BigQuery Connector for CData Sync

ProxyPort

The TCP port the ProxyServer proxy is running on.

Remarks

The port the HTTP proxy is running on that you want to redirect HTTP traffic through. Specify the HTTP proxy in ProxyServer. For other proxy types, see FirewallType.

Google BigQuery Connector for CData Sync

ProxyAuthScheme

The authentication type to use to authenticate to the ProxyServer proxy.

Remarks

This value specifies the authentication type to use to authenticate to the HTTP proxy specified by ProxyServer and ProxyPort.

Note that the Sync App will use the system proxy settings by default, without further configuration needed; if you want to connect to another proxy, you will need to set ProxyAutoDetect to false, in addition to ProxyServer and ProxyPort. To authenticate, set ProxyAuthScheme and set ProxyUser and ProxyPassword, if needed.

The authentication type can be one of the following:

  • BASIC: The Sync App performs HTTP BASIC authentication.
  • DIGEST: The Sync App performs HTTP DIGEST authentication.
  • NEGOTIATE: The Sync App retrieves an NTLM or Kerberos token based on the applicable protocol for authentication.
  • PROPRIETARY: The Sync App does not generate an NTLM or Kerberos token. You must supply this token in the Authorization header of the HTTP request.

If you need to use another authentication type, such as SOCKS 5 authentication, see FirewallType.

Google BigQuery Connector for CData Sync

ProxyUser

A user name to be used to authenticate to the ProxyServer proxy.

Remarks

The ProxyUser and ProxyPassword options are used to connect and authenticate against the HTTP proxy specified in ProxyServer.

You can select one of the available authentication types in ProxyAuthScheme. If you are using HTTP authentication, set this to the user name of a user recognized by the HTTP proxy. If you are using Windows or Kerberos authentication, set this property to a user name in one of the following formats:

user@domain
domain\user

Google BigQuery Connector for CData Sync

ProxyPassword

A password to be used to authenticate to the ProxyServer proxy.

Remarks

This property is used to authenticate to an HTTP proxy server that supports NTLM (Windows), Kerberos, or HTTP authentication. To specify the HTTP proxy, you can set ProxyServer and ProxyPort. To specify the authentication type, set ProxyAuthScheme.

If you are using HTTP authentication, additionally set ProxyUser and ProxyPassword to HTTP proxy.

If you are using NTLM authentication, set ProxyUser and ProxyPassword to your Windows password. You may also need these to complete Kerberos authentication.

For SOCKS 5 authentication or tunneling, see FirewallType.

By default, the Sync App uses the system proxy. If you want to connect to another proxy, set ProxyAutoDetect to false.

Google BigQuery Connector for CData Sync

ProxySSLType

The SSL type to use when connecting to the ProxyServer proxy.

Remarks

This property determines when to use SSL for the connection to an HTTP proxy specified by ProxyServer. This value can be AUTO, ALWAYS, NEVER, or TUNNEL. The applicable values are the following:

AUTODefault setting. If the URL is an HTTPS URL, the Sync App will use the TUNNEL option. If the URL is an HTTP URL, the component will use the NEVER option.
ALWAYSThe connection is always SSL enabled.
NEVERThe connection is not SSL enabled.
TUNNELThe connection is through a tunneling proxy. The proxy server opens a connection to the remote host and traffic flows back and forth through the proxy.

Google BigQuery Connector for CData Sync

ProxyExceptions

A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer .

Remarks

The ProxyServer is used for all addresses, except for addresses defined in this property. Use semicolons to separate entries.

Note that the Sync App uses the system proxy settings by default, without further configuration needed; if you want to explicitly configure proxy exceptions for this connection, you need to set ProxyAutoDetect = false, and configure ProxyServer and ProxyPort. To authenticate, set ProxyAuthScheme and set ProxyUser and ProxyPassword, if needed.

Google BigQuery Connector for CData Sync

Logging

This section provides a complete list of the Logging properties you can configure in the connection string for this provider.


PropertyDescription
LogModulesCore modules to be included in the log file.
Google BigQuery Connector for CData Sync

LogModules

Core modules to be included in the log file.

Remarks

Only the modules specified (separated by ';') will be included in the log file. By default all modules are included.

See the Logging page for an overview.

Google BigQuery Connector for CData Sync

Schema

This section provides a complete list of the Schema properties you can configure in the connection string for this provider.


PropertyDescription
LocationA path to the directory that contains the schema files defining tables, views, and stored procedures.
BrowsableSchemasThis property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC.
TablesThis property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC.
ViewsRestricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC.
RefreshViewSchemasAllows the provider to determine up-to-date view schemas automatically.
ShowTableDescriptionsControls whether table descriptions are returned via the platform metadata APIs and sys_tables / sys_views.
PrimaryKeyIdentifiersSet this property to define primary keys.
AllowedTableTypesSpecifies what kinds of tables will be visible.
FlattenObjectsDetermines whether the provider flattens STRUCT fields into top-level columns.
Google BigQuery Connector for CData Sync

Location

A path to the directory that contains the schema files defining tables, views, and stored procedures.

Remarks

The path to a directory which contains the schema files for the Sync App (.rsd files for tables and views, .rsb files for stored procedures). The folder location can be a relative path from the location of the executable. The Location property is only needed if you want to customize definitions (for example, change a column name, ignore a column, and so on) or extend the data model with new tables, views, or stored procedures.

If left unspecified, the default location is "%APPDATA%\\CData\\GoogleBigQuery Data Provider\\Schema" with %APPDATA% being set to the user's configuration directory:

Platform %APPDATA%
Windows The value of the APPDATA environment variable
Linux ~/.config

Google BigQuery Connector for CData Sync

BrowsableSchemas

This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC.

Remarks

Listing the schemas from databases can be expensive. Providing a list of schemas in the connection string improves the performance.

Google BigQuery Connector for CData Sync

Tables

This property restricts the tables reported to a subset of the available tables. For example, Tables=TableA,TableB,TableC.

Remarks

Listing the tables from some databases can be expensive. Providing a list of tables in the connection string improves the performance of the Sync App.

This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.

Specify the tables you want in a comma-separated list. Each table should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Tables=TableA,[TableB/WithSlash],WithCatalog.WithSchema.`TableC With Space`.

Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.

Google BigQuery Connector for CData Sync

Views

Restricts the views reported to a subset of the available tables. For example, Views=ViewA,ViewB,ViewC.

Remarks

Listing the views from some databases can be expensive. Providing a list of views in the connection string improves the performance of the Sync App.

This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.

Specify the views you want in a comma-separated list. Each view should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Views=ViewA,[ViewB/WithSlash],WithCatalog.WithSchema.`ViewC With Space`.

Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.

Google BigQuery Connector for CData Sync

RefreshViewSchemas

Allows the provider to determine up-to-date view schemas automatically.

Remarks

When using BigQuery views, BigQuery stores a copy of the view schema with the view itself. However, these stored view schemas are not updated when the tables used by the view change. This means that the stored view schema can easily become out of date and cause queries using the view to fail.

By default, the Sync App will not use the stored view schema and will instead query the view to determine the available columns. This guarantees that the schema will be up to date although it requires the Sync App to start a query job.

You can disable this option to force the Sync App to use the stored view schemas. This prevents the Sync App from running any queries when getting a view schema, but also means that queries using the view will fail if the schema is out of date.

Google BigQuery Connector for CData Sync

ShowTableDescriptions

Controls whether table descriptions are returned via the platform metadata APIs and sys_tables / sys_views.

Remarks

By default table descriptions are not shown, since the Google BigQuery API requires an extra request beyond what is usually required for reading tables.

Enabling this option will show table descriptions, but will cost an extra API request for every table when a table list is fetched. This can slow down metadata operations on large datasets.

Google BigQuery Connector for CData Sync

PrimaryKeyIdentifiers

Set this property to define primary keys.

Remarks

Google BigQuery does not natively support primary keys, but for certain DML operations or database tools you may need to define them. By default this option is disabled and no tables will have primary keys except for the ones defined in schema files (if you set Location).

Primary keys are defined using a list of rules which match tables and provide a list of key columns. For example, PrimaryKeyIdentifiers="*=key;transactions=tx_date,tx_serial;user_comments=" has three rules separated by semicolons:

  1. The first rule *=key means that every table without a more specific rule will have one primary key column called key. Tables that do not have a key column will not have any primary keys.
  2. The second rule transactions=tx_date,tx_serial means that the transactions table will have the two primary key columns tx_date and tx_serial. If any of those columns are missing from the table then they will be ignored.
  3. The third rule user_comments= means that the user_comments table will have no primary keys. The only use that empty key lists have is in overriding the default rule. If there is no default rule present then the only tables with primary keys would be the ones explicitly listed.

Note that the table names can include just the table, the table and dataset or the table, dataset and project. Both column and table names may be quoted using SQL quotes:

/* Rules with just table names use the connection ProjectId (or DataProjectId) and DatasetId. 
   All these rules refer to the same table with a connection where ProjectId=someProject;DatasetId=someDataset */
someTable=a,b,c
someDataset.someTable=a,b,c
someProject.someDataset.someTable=a,b,c

/* Any table or column name may be quoted */
`someProject`."someDataset".[someTable]=`a`,[b],"c"

Google BigQuery Connector for CData Sync

AllowedTableTypes

Specifies what kinds of tables will be visible.

Remarks

This option is a comma-separated list of the table type values that the Sync App displays. Any table-like or view-like entity that doesn't have a matching type will not be reported when listing tables.

  • TABLE Standard tables
  • EXTERNAL Read-only table stored on another service (like GCS or Drive)
  • SNAPSHOT A read-only table that preserves the data of another table at a specific point in time
  • VIEW Standard views
  • MATERIALIZED_VIEW A view that is recalculated and cached each time its base table changes

For example, to restrict the Sync App to listing only simple tables and views, this option would be set to TABLE,VIEW

Google BigQuery Connector for CData Sync

FlattenObjects

Determines whether the provider flattens STRUCT fields into top-level columns.

Remarks

By default the Sync App reports each field in a STRUCT column as its own column while the STRUCT column itself is hidden. This process is recursively applied to nested STRUCT values. For example, if the following table is defined in Google BigQuery then the Sync App reports 3 columns: location.coords.lat, location.coords.lon and location.country:

CREATE TABLE t(location STRUCT<coords STRUCT<lat FLOAT64, lon FLOAT64>, country STRING>);

If this property is disabled, then the top-level STRUCT is not expanded and is left as its own column. The value of this column is reported as a JSON aggregate. In the above example, the Sync App reports only the location column when flattening is disabled.

Google BigQuery Connector for CData Sync

Miscellaneous

This section provides a complete list of the Miscellaneous properties you can configure in the connection string for this provider.


PropertyDescription
StorageTimeoutHow long a Storage API connection must remain idle before the provider reconnects.
AllowAggregateParametersAllows raw aggregates to be used in parameters when QueryPassthrough is enabled.
ApplicationNameAn application name in the form application/version. For example, AcmeReporting/1.0.
AuditLimitThe maximum number of rows which will be stored within an audit table.
AuditModeWhat provider actions should be recorded to audit tables.
BigQueryOptionsA comma separated list of Google BigQuery options.
GenerateSchemaFilesIndicates the user preference as to when schemas should be generated and saved.
MaximumBillingTierThe MaximumBillingTier is a positive integer that serves as a multiplier of the basic price per TB. For example, if you set MaximumBillingTier to 2, the maximum cost for that query will be 2x basic price per TB.
MaximumBytesBilledLimits how many bytes BigQuery will allow a job to consume before it is cancelled.
MaxRowsLimits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.
OtherThese hidden properties are used only in specific use cases.
PseudoColumnsThis property indicates whether or not to include pseudo columns as columns to the table.
QueryPassthroughThis option passes the query to the Google BigQuery server as is.
TableSamplePercentThis determines what percent of a table is sampled with the TABLESAMPLE operator.
TimeoutThe value in seconds until the timeout error is thrown, canceling the operation.
UserDefinedViewsA filepath pointing to the JSON configuration file containing your custom views.
Google BigQuery Connector for CData Sync

StorageTimeout

How long a Storage API connection must remain idle before the provider reconnects.

Remarks

Google BigQuery and many proxies/firewalls restrict the amount of time that idle connections stay alive before they are forcibly closed. This can be a problem when using the Storage API because the Sync App may stream data faster than it can be consumed. While the consumer is catching up, the Sync App does not use its connection and it may be closed by the next time the Sync App uses it.

To avoid this the Sync App will automatically close and reopen the connection if it has been idle for too long. This property controls how many seconds the connection has to be idle for the Sync App to reset it. To disable these resets this property can also set to 0 or a negative value.

Google BigQuery Connector for CData Sync

AllowAggregateParameters

Allows raw aggregates to be used in parameters when QueryPassthrough is enabled.

Remarks

This option affects how string parameters are handled when using direct queries through QueryPassthrough. For example, consider this query:

INSERT INTO proj.data.tbl(x) VALUES (@x)

By default, this option is disabled and string parameters are quoted and escaped into SQL strings. That means that any value can be safely used as a string parameter, but it also means that parameters cannot be used as raw aggregate values:

/*
 * If @x is set to: test value ' contains quote
 *
 * Result is a valid query
*/
INSERT INTO proj.data.tbl(x) VALUES ('test value \' contains quote')

/*
 * If @x is set to: ['valid', ('aggregate', 'value')]
 *
 * Result contains string instead of aggregate:
*/
INSERT INTO proj.data.tbl(x) VALUES ('[\'valid\', (\'aggregate\', \'value\')]')

When this option is enabled, string parameters are inserted directly into the query. This means that raw aggregates can be used as parameters, but it also means that all simple strings must be escaped:

/*
 * If @x is set to: test value ' contains quote
 *
 * Result is an invalid query
*/
INSERT INTO proj.data.tbl(x) VALUES (test value ' contains quote)

/*
 * If @x is set to: ['valid', ('aggregate', 'value')]
 *
 * Result is an aggregate
*/
INSERT INTO proj.data.tbl(x) VALUES (['valid', ('aggregate', 'value')])

Google BigQuery Connector for CData Sync

ApplicationName

An application name in the form application/version. For example, AcmeReporting/1.0.

Remarks

The Sync App identifies itself to BigQuery using a Google partner User-Agent header. The first part of the User-Agent is fixed and identifies the client as a specific build of the CData Sync App. The last portion reports the specific application using the Sync App.

Google BigQuery Connector for CData Sync

AuditLimit

The maximum number of rows which will be stored within an audit table.

Remarks

When auditing is enabled with the AuditMode option, this property is used to determine how many rows will be allowed in the audit table at once.

By default this property is 1000, meaning that only the 1000 most recent audit events will be available within the audit table.

This property can also be set to -1, which places no limits on the size of the audit table. In this mode, the audit table should be periodically cleared to prevent the Sync App from using excessive memory.

DELETE FROM AuditJobs#TEMP

Google BigQuery Connector for CData Sync

AuditMode

What provider actions should be recorded to audit tables.

Remarks

The Sync App can record certain internal actions taken when it runs queries. For each of those actions listed in this option, the Sync App will create a temproary audit table which logs when the action took place, what query caused the action and any other relevant information.

By default this option is set to 'none' and the Sync App does not record any audit information. This option can also be set to a comma-separated list of the following actions:

Mode Name Audit Table Description Columns
start-jobs AuditJobs#TEMP Records all jobs started by the Sync App Timestamp,Query,ProjectId,Location,JobId

Refer to AuditLimit for more information on how to limit the size of these tables.

Google BigQuery Connector for CData Sync

BigQueryOptions

A comma separated list of Google BigQuery options.

Remarks

A list of Google BigQuery options:

OptionDescription
gbqoImplicitJoinAsUnionThis option will prevent the driver from converting an IMPLICIT JOIN into a CROSS JOIN as expected by SQL92. Instead, it will leave it as an IMPLICIT JOIN, which Google BigQuery will execute as a UNION ALL.

Google BigQuery Connector for CData Sync

GenerateSchemaFiles

Indicates the user preference as to when schemas should be generated and saved.

Remarks

This property outputs schemas to .rsd files in the path specified by Location.

Available settings are the following:

  • Never: A schema file will never be generated.
  • OnUse: A schema file will be generated the first time a table is referenced, provided the schema file for the table does not already exist.
  • OnStart: A schema file will be generated at connection time for any tables that do not currently have a schema file.
  • OnCreate: A schema file will be generated by when running a CREATE TABLE SQL query.
Note that if you want to regenerate a file, you will first need to delete it.

Generate Schemas with SQL

When you set GenerateSchemaFiles to OnUse, the Sync App generates schemas as you execute SELECT queries. Schemas are generated for each table referenced in the query.

When you set GenerateSchemaFiles to OnCreate, schemas are only generated when a CREATE TABLE query is executed.

Generate Schemas on Connection

Another way to use this property is to obtain schemas for every table in your database when you connect. To do so, set GenerateSchemaFiles to OnStart and connect.

Google BigQuery Connector for CData Sync

MaximumBillingTier

The MaximumBillingTier is a positive integer that serves as a multiplier of the basic price per TB. For example, if you set MaximumBillingTier to 2, the maximum cost for that query will be 2x basic price per TB.

Remarks

Limits the billing tier for this job. Queries that have resource usage beyond this tier will fail (without incurring a charge). If unspecified, this will be set to your project default. If your query is too compute intensive for BigQuery to complete at the standard per TB pricing tier, BigQuery returns a billingTierLimitExceeded error and an estimate of how much the query would cost. To run the query at a higher pricing tier, pass a new value for maximumBillingTier as part of the query request. The maximumBillingTier is a positive integer that serves as a multiplier of the basic price per TB. For example, if you set maximumBillingTier to 2, the maximum cost for that query will be 2x basic price per TB.

Google BigQuery Connector for CData Sync

MaximumBytesBilled

Limits how many bytes BigQuery will allow a job to consume before it is cancelled.

Remarks

When this value is provided, all jobs will use this value as their default billing cap. If a job uses more than this many bytes, BigQuery will cancel it and it will not be billed. By default there is no cap and all jobs will be billed for however many bytes they consume.

This only has an effect when using DestinationTable or when using the InsertJob stored procedure. BigQuery does not allow standard query jobs to have byte limits.

Google BigQuery Connector for CData Sync

MaxRows

Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.

Remarks

Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.

Google BigQuery Connector for CData Sync

Other

These hidden properties are used only in specific use cases.

Remarks

The properties listed below are available for specific use cases. Normal driver use cases and functionality should not require these properties.

Specify multiple properties in a semicolon-separated list.

Integration and Formatting

DefaultColumnSizeSets the default length of string fields when the data source does not provide column length in the metadata. The default value is 2000.
ConvertDateTimeToGMTDetermines whether to convert date-time values to GMT, instead of the local time of the machine.
RecordToFile=filenameRecords the underlying socket data transfer to the specified file.

Google BigQuery Connector for CData Sync

PseudoColumns

This property indicates whether or not to include pseudo columns as columns to the table.

Remarks

This setting is particularly helpful in Entity Framework, which does not allow you to set a value for a pseudo column unless it is a table column. The value of this connection setting is of the format "Table1=Column1, Table1=Column2, Table2=Column3". You can use the "*" character to include all tables and all columns; for example, "*=*".

Google BigQuery Connector for CData Sync

QueryPassthrough

This option passes the query to the Google BigQuery server as is.

Remarks

When this is set, queries are passed through directly to Google BigQuery.

Google BigQuery Connector for CData Sync

TableSamplePercent

This determines what percent of a table is sampled with the TABLESAMPLE operator.

Remarks

This option can be set to make the Sync App use the TABLESAMPLE for each table referenced by a query. The value determines what percent is provided to the PERCENT clause. That clause will only be generated if this property's value is above zero.

-- Input SQL
SELECT * FROM `tbl`

-- Generated Google BigQuery SQL when TableSamplePercent=10
SELECT * FROM `tbl` TABLESAMPLE SYSTEM (10 PERCENT)

This option is subject to a few limitations:

  • It is applied during query converison and has no effect when QueryPassthrough is set.
  • More rows may be returned than expected due to how the server implements TABLESAMPLE. Please see the Google BigQuery documentation for more information.
  • TABLESAMPLE is not supported on views. If a view is queried in sampling mode, the Sync App will omit the TABLESAMPLE clause for the view.

Google BigQuery Connector for CData Sync

Timeout

The value in seconds until the timeout error is thrown, canceling the operation.

Remarks

If Timeout = 0, operations do not time out. The operations run until they complete successfully or until they encounter an error condition.

If Timeout expires and the operation is not yet complete, the Sync App throws an exception.

Google BigQuery Connector for CData Sync

UserDefinedViews

A filepath pointing to the JSON configuration file containing your custom views.

Remarks

User Defined Views are defined in a JSON-formatted configuration file called UserDefinedViews.json. The Sync App automatically detects the views specified in this file.

You can also have multiple view definitions and control them using the UserDefinedViews connection property. When you use this property, only the specified views are seen by the Sync App.

This User Defined View configuration file is formatted as follows:

  • Each root element defines the name of a view.
  • Each root element contains a child element, called query, which contains the custom SQL query for the view.

For example:

{
	"MyView": {
		"query": "SELECT * FROM [publicdata].[samples].github_nested WHERE MyColumn = 'value'"
	},
	"MyView2": {
		"query": "SELECT * FROM MyTable WHERE Id IN (1,2,3)"
	}
}
Use the UserDefinedViews connection property to specify the location of your JSON configuration file. For example:
"UserDefinedViews", C:\Users\yourusername\Desktop\tmp\UserDefinedViews.json
Note that the specified path is not embedded in quotation marks.

Google BigQuery Connector for CData Sync

Third Party Copyrights

protobuf

Copyright 2008 Google Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Google Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Code generated by the Protocol Buffer compiler is owned by the owner of the input file used when generating it. This code is not standalone and requires a support library to be linked with it. This support library is itself covered by the above license.

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 23.0.8839