DBT Transformations

Version 22.0.8483

DBT Transformations

CData Sync now integrates with dbt Core (a data building tool by dbt Labs™), which enables you to create powerful transformations. This integration enables you to develop, control, and run transformations to build Python models in a supported destination database.

You have complete control over your project repository and the ability to write SQL SELECT statements to transform your data and build models. Sync simply creates a pipeline for you to run those transformations in your chosen destination.

If you only need to execute simple transformations, then basic SQL transformations are sufficient. However, for advanced modeling requirements, transformations that use dbt Core models offer more robust capabilities. For example, if your models contain dependencies, a transformation that uses a dbt Core project recognizes and honors those dependencies, enabling the models to build in the correct order.

Supported Destinations

Transformations of type DBT are supported for the following destinations:

  • Snowflake

  • Amazon Redshift

  • Databricks

  • PostgreSQL

dbt Core Details

Setting Up Tools for DBT Transformations

You need to set up dbt Core and other tools first to enable DBT transformations, as follows:

  1. Install Git for Windows and Python 3.7 or later for Microsoft Windows.

    Note: An application like Git is not required. However, such a repository is recommended because it provides versioning, enables you to share projects, and gives you more control of your source code.

    In Git, locate and copy the following information for later use in the Sync settings for a DBT transformation:

    • Git Repository URL - The repository URL is available in your Git repository window. Click Code to open the Clone dialog. The URL is on the HTTPS tab in the dialog. Copy the URL by clicking the copy icon on the right end of the URL field. Save the icon for use later in Sync.

    • Personal Access Token - To obtain your personal access token:

      1. Click your profile icon in the top right of the Git window and select Settings to open the Profile settings page.

      2. Select Developer Settings > Personal access tokens. This selection opens the New personal access token pane.

      3. Select the Repo check box in the Select scopes category. (You can set other criteria in this category as well, but it is not necessary.) For example, you might want to choose an expiration period (or no expiration) for your token under the Expiration category.

      4. Click Generate token at the bottom of the page. Copy and save the generated token for use later in Sync.

      For more details about obtaining a personal access token, see Creating a personal access token.

  2. Install dbt Core and the adapter plug-in for your destination on the same machine where Sync is installed. Plug-ins are available for each of the destinations that are listed earlier.

  3. Download your dbt projects locally by cloning your Git repository to your local machine. Sync clones the repository upon setup and before each run to ensure that any changes to the project are reflected in the next run from Sync.

  4. Create a transformation in Sync (using dbt projects that contain YML or SQL files), as described in the next section.

Creating a DBT Transformation

To create a transformation:

  1. Open Sync and navigate to the Transformations page.

  2. Click Create Transformation. Then, fill out the fields in the Create New Transformation dialog, as follows:

    • Transformation Name - Enter a name for your transformation.

    • Type - Select DBT as the transformation type.

    • Project Folder - Select Github.

    • Git Repository URL - Enter the URL that you copied earlier in Git. This URL should be in the form https:github.com/owner/repository.git

    • Token - Enter the personal access token that you copied earlier in Git.

    • Destination Schema - Add a destination schema of your choice.

  3. Click Create.

After you create your transformation, you can adjust settings, set notification alerts, and schedule the transformation to run from the Job Settings tabs. These tabs and settings are explained in the section Understanding the Transformations Detail Page.

dbt Cloud Details

Connecting CData Sync to dbt Cloud

dbt Cloud is an integrated development environment (IDE) in which you can write, run, and test code as well as manage version control in your dbt project from your browser (no command-line use is required).

To connect to dbt Cloud:

  1. Open the Advanced tab on the Sync Settings page and define your account Id and API key.

    • Your account Id is available in the URL that you see when you select Develop > Classic IDE in dbt Cloud.

      Example: https://cloud.getdbt.com/ide/#/accounts/XXXXXX/projects/123456/develop (The account Id is represented by XXXXXX in this path.)

    • The API key is available by clicking the Settings icon (at the top right of the dbt Cloud window) and then selecting Account Settings > API Access. The key is listed in the API section at the bottom of the API Access page.

  2. Click Test Connection to confirm that you are connected.

Creating a DBT Cloud Transformation

To create a transformation:

  1. Click the Transformations tab in the Sync Admin console.

  2. Click Create Transformation. Then, fill out the fields in the Create New Transformation modal, as follows:

    • Transformation Name - Enter a name for your transformation.

    • Type - Select dbt Cloud.

    • dbt Job - Select the job from the drop-down list.

  3. Click Create.

After you create your transformation, you can adjust settings, set notification alerts, and schedule the transformation to run from the Job Settings tabs. These tabs and settings are explained in the next section.

Understanding the Transformations Detail Page

The Transformations detail page consists of two sections:

  • Connection - Displays the connection, which cannot be modified.

  • Job Settings - Contains various job settings that you can define for transformations.

Job settings appear on four tabs: Status, Triggers, Notifications, and Settings. The settings on these tabs are explained in the following sections.

Status Tab

The Status tab displays the run start date and time, the status, and details (records affected, failures, and so on) for each transformation. This tab also enables you to see the log and history for a transformation. When you click an individual transformation, a modal opens that contains the log for that transformation.

Triggers Tab

The Triggers tab enables you to set the following options to trigger when your transformation runs:

  • None - Turns off scheduling options, allowing you to start a transformation manually.

  • Schedule - Enables you to set a specific schedule by which to run your transformation. This setting also activates the Intervals field. You can choose more granular settings for the Intervals setting:

    • Minute - This setting activates a Minutes option where you can specify increments in minutes (every 10, 15, 20, or 30 minutes) in which to run your transformation.

    • Hourly - This setting activates a Minutes Past the Hour field so you can set your transformation to run hourly at specific numbers of minutes past the hour (0-59). For example, you might want to run the transformation every hour at 15 minutes past the hour.

    • Daily - This setting activates a Time field so you can set the specific time each day that you want your transformation to run.

    • Weekly - This setting activates two fields, Day and Time, so you can specify a specific day of the week and time to run your transformation.

    • Monthly - This setting also activates Day and Time fields. However, for Day, you choose a specific day of each month (values of 1 through 31 or Last Day) along with the time.

  • Advanced - This setting activates a Cron Expression field where you can write a cron job to initiate a transformation run.

    The Schedule type is useful when you run multiple jobs running. You cannot use the After Job option for multiple jobs because you cannot specify after which job to run the transformation.

  • After Job - This option activates a Job option that presents a list of all available jobs. You can choose the specific job after which you want the transformation to run.

    You can also specify various tasks to run on your data by clicking Add Tasks under the Job field. In the Add Tasks modal that opens, simply select the task or tasks that you want to run and click Add.

    If the query from your job tasks produces any new records in your destination, then the transformation runs. If the query does not produce new records (that is, nothing changes in the source or destination), then the transformation does not run.

  • API Trigger - This field provides an example of how to run your transformation from an API.

Notifications Tab

You can set email notifications for transformations on the Notifications tab of the Transformations Job Settings page. You can choose to receive an email notification for all transformations, or you can choose to receive notifications only for transformations in which errors occur.

Settings Tab

The Settings tab contains these standard options:

  • Command Timeout

  • Project Path

  • DBT Schema

  • Threads