DBT Transformations
Version 24.2.9064
Version 24.2.9064
DBT Transformations
CData Sync now integrates with dbt Core (a data building tool by dbt Labs™), which enables you to create powerful transformations. This integration enables you to develop, control, and run transformations to build Python models in a supported destination database.
You have complete control over your project repository and the ability to write SQL SELECT statements to transform your data and build models. Sync simply creates a pipeline for you to run those transformations in your chosen destination.
If you only need to execute simple transformations, then basic SQL transformations are sufficient. However, for advanced modeling requirements, transformations that use dbt Core models offer more robust capabilities. For example, if your models contain dependencies, a transformation that uses a dbt Core project recognizes and honors those dependencies, enabling the models to build in the correct order.
Supported Destinations
Transformations of type DBT are supported for the following destinations:
-
Snowflake
-
Amazon Redshift
-
Databricks
-
PostgreSQL
-
Google BigQuery
dbt Core Details
Setting Up Tools for DBT Transformations
You need to set up dbt Core and other tools first to enable DBT transformations, as follows:
-
Install Git for Windows and Python 3.7 or later for Microsoft Windows.
Note: An application like Git is not required. However, such a repository is recommended because it provides versioning, enables you to share projects, and gives you more control of your source code.
In Git, locate and copy the following information for later use in the Sync settings for a DBT transformation:
-
Git Repository URL - The repository URL is available in your Git repository window. Click Code to open the Clone dialog. The URL is on the HTTPS tab in the dialog. Copy the URL by clicking the copy icon on the right end of the URL field. Save the icon for use later in Sync.
-
Personal Access Token - To obtain your personal access token:
- Click your profile icon in the top right of the Git window and select Settings to open the Profile settings page.
- Select Developer Settings > Personal access tokens. This selection opens the New personal access token pane.
- Select the Repo check box in the Select scopes category. (You can set other criteria in this category as well, but it is not necessary.) For example, you might want to choose an expiration period (or no expiration) for your token under the Expiration category.
- Click Generate token at the bottom of the page. Copy and save the generated token for use later in Sync.
For more details about obtaining a personal access token, see Creating a personal access token.
-
-
Install dbt Core and the adapter plug-in for your destination on the same machine where Sync is installed. Plug-ins are available for each of the destinations that are listed earlier.
-
Download your dbt projects locally by cloning your Git repository to your local machine. Sync clones the repository upon setup and before each run to ensure that any changes to the project are reflected in the next run from Sync.
-
Create a transformation in Sync (using dbt projects that contain YML or SQL files), as described in the next section.
Creating a DBT Transformation
To create a dbt transformation:
-
Open Sync and navigate to the Transformations page.
-
Select the job on which you want to run a transformation. Then, click Add Transformations.
-
Fill out the fields in the Add Transformation dialog box, as follows:
-
Name: Enter a name for your transformation.
-
Type: Select dbt Core as the transformation type.
-
Project Folder: Select one of the following options. The default option is Local.
-
Local: Enter the path for your existing local project folder. For this option, you also need to enter your dbt project folder page in the Folder Path field.
-
Github: Enter the URL for the project that you created earlier in Git. For this option, enter the following information:
-
Git Repository URL: Enter your Git repository URL in the form https://github.com/Owner/Repository.git.
-
Token: Enter the personal access token that you copied earlier in Git.
-
-
-
Destination Schema - Add a destination schema of your choice.
-
-
Click Add Transformation.
dbt Cloud Details
Connecting CData Sync to dbt Cloud
dbt Cloud is an integrated development environment (IDE) in which you can write, run, and test code as well as manage version control in your dbt project from your browser (no command-line use is required).
To connect to dbt Cloud:
-
Open the Advanced tab on the Sync Settings page and navigate to the dbt Cloud Settings section.
-
Click the edit icon for that section to open the dbt Cloud Settings dialog box.
-
Define your account Id and API key in the dbt Cloud Settings section. Your account Id is available in the URL that you see in your dbt Cloud account settings. To retrieve the Id, sign in to your dbt Cloud account and select Settings (the gear icon) > Account Settings. The account Id follows the /accounts/ path component, as shown in the following example:
Example: https://cloud.getdbt.com/settings/accounts/1234
In this example, the account Id is 1234.
The account Id is also available in the URL that is displayed when you click the Develop tab, as shown below:
Example: https://cloud.getdbt.com/develop/1234/projects/ProjectId
-
Select the region for your dbt Cloud API server from the Region list. Values for this setting are as follows:
-
North America
-
Europe (EMEA)
-
Australia (APAC)
-
Custom Single Tenant
If you are not sure about your region, you can find it in the URL for your dbt Cloud API server. For example, if the URL for your dbt Cloud API server is https://cloud.us1.dbt.com, then your API server region is North America. If your API server URL is https://emea.dbt.com, then your API server region is Europe (EMEA).
Note: Currently, your URL might use cloud.getdbt.com instead of us1.dbt.com for the North America region. However, this URL will be removed in the future.
-
-
Enter either a dbt Cloud personal access token (PAT) or a service token in the Token field.
-
If you do not have a PAT, you can create it by selecting Account Settings > API Tokens > Personal Tokens in your instance of dbt Cloud. Copy the new token and return to Sync to paste it into the Token field in the dbt Cloud Settings dialog box.
-
If you do not have service token, you can create it by selecting Account Settings > Service Tokens > New Token. Copy the new token and return to Sync to paste it into the Token field in the dbt Cloud Settings dialog box.
-
-
Click Test Connection to confirm that you are connected.
Creating a DBT Cloud Transformation
To create a dbt Cloud transformation:
-
Open Sync and navigate to the Transformations page.
-
Select the job on which you want to run a transformation. Then, click Add Transformations.
-
Fill out the fields in the Add Transformation dialog box, as follows:
-
Name: Enter a name for your transformation.
-
Type: Select dbt Cloud as the transformation type.
-
dbt Job: Select a dbt job from the drop-down list.
-
-
Click Add Transformation.
Understanding the Transformations Page
The Transformations page contains a table that lists the details of your transformations including the destination used, whether the transformation is scheduled, the last run status, and the next run. You can also filter the table by transformation type, as shown below.
To use bulk operations on your job (Run, Disable Schedule, Enable Schedule, and Deselect), select the checkbox next to one or more jobs.
You can manage each transformations from the options menu (…). To see that menu, position your cursor at the end of the transformation row. The following options are available from that menu:
-
Edit - Enables you to edit your transformation settings, including modifying general settings, setting notification alerts, scheduling the transformation to run, and viewing the history for a transformation.
-
Download Log File - Downloads the log file for your last transformation run.
-
Run Details - Opens a dialog box that displays information about your transformation run. The dialog box contains two tabs of information:
-
Details: This tab displays the run date, the run status, the amount of time it took to run the transformation, and the number of rows affected.
-
Logs: This tab lists the logs that are available for your transformation. You can download individual logs here by clicking the download icon to right of the log name. You can also download all logs by clicking Download All Log Files.
-
-
Remove - Permanently deletes the transformation.
Specifying Transformation Settings
To modify transformation settings, select … > Edit at the end of the transformation row. The settings page contains two tabs from which you can manage settings:
-
Overview: This tab provides Settings, Notifications, and Triggers categories where you can modify those setting types.
-
Transformations History: This tab provides details about your transformation job run. Details include Run Date, Status, Records Affected, and Run Time. You can also filter your report to show transformations by status (All Status, SUCCESS, FAILED, or ERROR). In addition, you can export the history by clicking Export History.
Overview Tab
The Overview tab enables you to manage general settings, triggers, and notifications. You can also see details from the last run of your transformation on this page.
Settings Category
The Settings category enables you to set logfile verbosity and command timeout, as follows.
-
Select … > Edit at the end of the transformation row. This action opens the Edit Settings dialog box.
-
Select a value for Logfile Verbosity.
-
Enter a timeout value in the Command Timeout field.
-
Click Save.
Trigger Category
The Triggers category enables you to set a specific schedule by which to run your transformation, as follows.
-
Click Configure in the Triggers category. This action opens the Trigger dialog box.
-
Specify the type of trigger that you want. You can select from the following types:
-
None - Turns off scheduling options, allowing you to start a transformation manually.
-
Scheduled - Enables you to set a specific schedule by which to run your transformation. This setting also activates the Intervals and Minutes Past The Hour fields. You can choose more granular settings for the Intervals setting:
-
Minute - Activates the Every option where you can specify increments in minutes (every 10, 15, 20, or 30 minutes) in which to run your transformation.
-
Hourly - Activates a Minutes Past The Hour field so you can set your transformation to run hourly at specific numbers of minutes past the hour (0-59). For example, you might want to run the transformation every hour at 15 minutes past the hour.
-
Daily - Activates a Time of Day field so you can set the specific time each day that you want your transformation to run.
-
Weekly - Activates two fields (Days and Time of Day) where you can specify a specific day of the week and time to run your transformation.
-
Monthly - Activates Day and Time fields. However, for Day, you choose a specific day of each month (values of 1 through 31 or Last Day) along with the time.
-
Cron - Activates a Cron Statement field where you can write a cron job to initiate a transformation run.
The Scheduled type is useful when you run multiple jobs. You cannot use the After Job option for multiple jobs because you cannot specify after which job to run the transformation.
-
-
After Job - Activates a Job drop-down list that presents a list of all available jobs. You can choose the specific job after which you want the transformation to run.
You can also specify various tasks to run on your data by clicking Select Specific Tasks under the Job field. Simply select the task or tasks that you want to run.
-
-
Click Save.
Note: To disable schedule triggers, slide the Trigger toggle switch to the left (the switch turns gray when it is disabled). The toggle switch appears to the right of Trigger in the category box.
Notifications Category
The Notifications category enables you to set email notifications for transformations, as follows.
-
Select … > Edit at the end of the transformation row for which you want to set notifications.
-
Click Configure in the Notifications category.
-
Set the following options in the Notifications dialog box:
-
To - The email to which to send notifications.
-
Subject - The subject for notification emails.
-
Only send notifications on errors - An option that you can select to specify that notification emails should be sent only when the job encounters an error. If you do not select this option, Sync sends notifications after each job run.
-
-
Click Save.
Note: To disable notifications, slide the Notifications toggle switch to the left (the switch turns gray when it is disabled). The toggle switch appears to the right of Notifications in the category box.
Transformations History Tab
This tab provides details about your transformation job run. Details include Run Date, Status, Records Affected, and Run Time. You can also filter your report to show transformations by status (All Status, SUCCESS, FAILED, or ERROR). In addition, you can export the history by clicking Export History.