Google Cloud Storage

Version 25.3.9414


Google Cloud Storage


CData Sync アプリケーションからGoogle Cloud Storage コネクタを使用して、サポートされている任意のデータソースから同期先のGoogle Cloud Storage へデータを移動できます。これを行うには、コネクタを追加し、コネクタへの認証を行い、接続を完了する必要があります。

Prerequisites

Before you configure the Google Cloud Storage destination with the Delta Parquet file format in the Microsoft Windows operating system (OS), make sure that your environment meets the requirements explained below. These prerequisites ensure that Sync can interact correctly with Delta Lake by locating the required Hadoop binaries under Windows.

Configure your Windows OS, as follows:

  • Download Hadoop binaries (recommended version: 2.8.1 or later).

  • Ensure that HADOOP_HOME environment variable to the Hadoop installation directory.

  • Ensure that %HADOOP_HOME%\bin is included in your PATH system variable (specifically, %HADOOP_HOME%\bin\winutils.exe must be accessible).

This configuration is necessary because Delta Lake (on Spark) uses Hadoop’s file system APIs to access local storage, the Hadoop Distributed File System (HDFS), and cloud object stores like Google Cloud Storage. Under Windows, Spark must be able to locate the Hadoop binaries (including winutils.exe and other native libraries) to function correctly. Without this configuration, operations such as writing Delta tables, managing checkpoints, or accessing cloud storage can fail with permission or file-system errors.

Supported File Formats

When Sync writes data to Google Cloud Storage, you can choose the file format for the exported data. The following file formats are supported for the Google Cloud Storage destination:

  • (Default) Delta Parquet—A format that uses a Delta Lake storage layer on top of the Parquet file format that is used by Sync to support delta processing. Delta processing is a method where, after your initial job run, only new or modified files are written or read in subsequent runs, which can reduce job times and resource use.

    Limitations:

    • Naming restrictions: Table and column names cannot include special characters or reserved SQL and Delta Lake keywords. Examples of special characters include spaces, commas, semicolons, braces, parentheses, equal signs, and the newline (\n) and tab (\t) characters.

    • Primary keys: Primary key constraints are not supported. Sync uses the source primary keys for incremental replication.

    • Data types: Unlike traditional databases, Delta Lake does not support column-size definitions (for example, VARCHAR(100)). It supports only a fixed set of data types and allows type widening when necessary.

    • Schema changes: The ALTER TABLE command supports only adding new columns. Changing the data type of an existing column (for example, from INT to VARCHAR) is not supported.

    • Delete operations: In standard jobs, both hard and soft deletions are supported. In CDC and enhanced CDC jobs, only soft deletions are supported.

  • Parquet—A columnar storage format that is optimized for analytics.

  • CSV—Plain text comma-separated values.

  • Avro—A row-based binary format that supports schema evolution.

Prerequisites

If you want to authenticate with a Google Cloud service account (for silent authentication or delegated organization-wide access), do the following before configuring your connection in Sync:

  • Create a service account in Google Cloud and grant it the required permissions on your bucket or project.

  • For OAuth JWT authentication, register a custom OAuth application and download the certificate file (.p12 or .pfx).

  • For Service Account Key File authentication (Delta Parquet only), download the JSON key file for the service account.

Then proceed to the relevant authentication method below.

Google Cloud Storage コネクタを追加

Sync でGoogle Cloud Storage のデータを使用できるようにするには、まず以下の手順でコネクタを追加する必要があります。

  1. Sync のダッシュボードから接続ページを開きます。

  2. 接続を追加をクリックしてコネクタを選択ページを開きます。

  3. データソースタブをクリックしてGoogle Cloud Storage 行に移動します。

  4. 行末にある接続を設定アイコンをクリックして、新しい接続ページを開きます。接続を設定アイコンが利用できない場合は、コネクタをダウンロードアイコンをクリックしてGoogle Cloud Storage コネクタをインストールします。新規コネクタのインストールについて詳しくは、接続を参照してください。

Google Cloud Storage への認証

コネクタを追加したら、必須プロパティを設定する必要があります。

  • Connection Name: Enter a connection name of your choice.

  • File Format: Select the file format that you want to use: Delta Parquet (default), CSV , Avro, or Parquet.

  • URI: Enter the path to the name of the bucket and folder that contain your files (for example, gs://_BucketName/RemotePath).

  • Project Id: Enter the identifier (Id) of the project where your Google Cloud Storage instance resides.

    Note: This property is required only with the Avro file format. For other file formats, the property is optional and it is set in Complete Your Connection.

CData Sync supports authenticating to Google Cloud Storage in several ways. Select your authentication method below to proceed to the relevant section that contains the authentication details.

Note: The full list of authentication methods above are for all file formats except Delta Parquet. That format uses only the Service Account Key File method.

Service Account Key File

To connect with a service account key file, specify the following properties:

  • Auth Scheme: Select ServiceAccountKeyFile.

  • Key file path: Enter the path where the JSON key file for the service account is located.

OAuth

To connect with OAuth custom credentials, specify the following properties:

  • Auth Scheme: Select OAuth.

  • OAuth Version: Select the version of OAuth that you want to use. The default version is 2.0.

  • (Optional) Scope: Specify the scope of your access to the application.

  • (Optional) OAuth Authorization URL: Enter the OAuth authorization URL for the OAuth service.

  • (Optional) OAuth Access Token URL: Enter the URL from which to retrieve the access token.

  • (Optional) OAuth Refresh Token URL: Enter the URL from which to refresh the OAuth token.

OAuth PKCE

CData Sync provides an embedded OAuth application with which to connect. To connect with the OAuth PKCE extension, specify the following properties:

  • Auth Scheme: select OAuthPKCE.

  • OAuth Client Id: Enter the client Id that you were assigned when you registered your application with an OAuth authorization server.

OAuth JWT

To connect with a Google Cloud Storage account, specify the following properties:

  • Auth Scheme: Select OAuthJWT.

  • OAuth JWT Cert: Enter your Java web tokens (JWT) certificate store.

  • OAuth JWT Cert Type: Enter the type of key store that contains your JWT certificate. The default type is PEMKEY_BLOB.

  • OAuth Client Id: Enter the client identifier (Id) that you were assigned when you registered your application with an OAuth authorization server.

  • OAuth Client Secret: Enter the client secret that you were assigned when you registered your application with an OAuth authorization server.

  • (Optional) Scope: Specify the scope of your access to the application.

  • (Optional) OAuth Authorization URL: Enter the OAuth authorization URL for the OAuth service.

  • (Optional) OAuth Access Token URL: Enter the URL from which to retrieve the access token.

  • (Optional) OAuth Refresh Token URL: Enter the URL from which to refresh the OAuth token.

  • (Optional) OAuth JWT Cert Password: Enter the password for your OAuth JWT certificate.

  • (Optional) OAuth JWT Cert Subject: Enter the subject of your OAuth JWT certificate.

  • (Optional) OAuth JWT Subject: Enter the user subject for which the application is requesting delegated access.

  • (Optional) OAuth JWT Subject Type: Select the subject type (enterprise or user) for the JWT authentication. The default type is enterprise.

  • (Optional) OAuth JWT Public Key Id: Enter the Id of the public key for JWT.

GCP Instance Account

GCP 仮想マシンでCData Sync を実行すると、Sync は仮想マシンに紐づけられたサービスアカウントを使用して認証できます。そのアカウントを使用するには、Auth SchemeGCPInstanceAccount を選択します。追加のプロパティは必要ありません。

Complete Your Connection

To complete your connection:

  1. Specify the following properties:

    For all file formats:

    • (Optional) Project Id: Enter the identifier (Id) of the project where your Google Cloud Storage instance resides.

      Note: This property is required for the Avro file format, and it is set in Authenticate to Google Cloud Storage

    For the Delta Parquet and CSV file formats:

    • FMT: Enter the format that you want to use to parse all text files. The default format is CsvDelimited.

    • Aggregate Files: Specify whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. The default option is False.

    • Include Column Headers: Specify whether you want to obtain column headers from the first lines of the specified files. The default option is True.

    For the Avro and Parquet file formats:

    • Data Model - Select the data model that you want to use to parse documents for your format and to generate the database metadata. The default data model is Document.

    • Aggregate Files - Specify whether you want to aggregate all the files that are located in the URI directory and that have the same schema into a single table named AggregatedFiles. The default option is False.

  2. 高度な設定タブで接続の高度な設定を定義します。(ただし、ほとんどの場合これらの設定は必要ありません。)

  3. OAuth or OAuthPKCE で認証する場合は、Google Cloud Storage への接続 をクリックしてGoogle Cloud Storage アカウントに接続します。

  4. 作成およびテストをクリックして接続を作成します。

詳細情報

CData Sync とGoogle Cloud Storage の連携について、詳しくはGoogle Cloud Storage Connector for CData Sync を参照してください。