Incremental Updates

CData Sync supports Incremental Updates for data warehouses. Instead of querying the entirety of your data every time, Sync only queries for data that has been added or changed since the last time your job ran, and it then merges that data into your data warehouse. This minimizes bandwidth usage and the latency of synchronization.

CData Sync pulls your data using two main methods of incremental update—Incremental Check Column and Log-Based Replication. These methods are outlined below.

Incremental Check Column

An Incremental Check Column is either a datetime or integer-based column used by Sync to identify new or modified records when replicating. Every time a record is added or updated in the source, the value for this column is increased. Sync can use this column as a criteria during the extract to ensure that only new or changed records are returned. Sync then stores the new maximum value of the column to be used in the next replication.

Replication by Incremental Check Column can work using two different data types:

  • DateTime Incremental Check Columns: A “Last Modified” or “Date Updated” column that represents when the record was last updated.
  • Integer-based Incremental Check Columns: An auto-incrementing ID or rowversion that increments every time a record is added or updated.

Log-Based Replication

Some sources support Log-Based Replication, where a source uses a logfile to log events (Insert, Update or Delete) that cause changes in the database. Rather than querying the source table for changes, Sync instead reads the logfile for any change events. Sync then extracts those changes for replication and stores the current log position for the next replication.

The sources listed below support Log-Based Replication:

Capturing Deletes

CData Sync automatically captures deletes, which ensures accuracy in your destination. Sync retrieves a list of deleted records from the source by calling the API or by using the Change Tracking feature.

If the source allows CData Sync to detect data that has been deleted, you can control how CData Sync handles deletes using the Deletion Behavior option in the Advanced section:

  • Hard Delete (Default): If a delete is detected in the source, Sync will remove that record from the destination table.
  • Soft Delete: Sync will add the “_cdata_deleted” column to your destination table. If a delete is detected in the source, Sync will set the value to “true” in the destination.
  • Skip Delete: Sync will ignore deleted records in the source.

Note that some APIs do not allow Sync to detect deleted records, as indicated in the Source Table Information. In these cases, Deletion Behavior will be ignored.

Data Types

CData Sync recognizes a large number of data types and, in situations where the data type isn’t strictly defined, Sync can infer the data type based on your data. Sync recognizes the following data types:

  • Boolean
  • Date
  • Time
  • TimeStamp
  • Decimal
  • Float
  • Double
  • SmallInt
  • Integer
  • Long
  • Binary
  • Varchar
  • GUID

Known Data Types

There are many sources, mostly relational databases and some APIs, where Sync can automatically detect the data types for the schema. Examples of this include SQL Server, Oracle, and Salesforce. When the source column data types are known, Sync automatically creates matching data types in the destination.

Inferred Data Types

In situations where a data type isn’t specified, CData Sync can infer the data type by scanning the first few rows of data to determine what the column data type should be.

When Sync detects a String type with an unknown column size, the default size of the column is treated as 2000. In a relational database such as SQL Server, Sync would create a varchar(2000) field for this type.

Sync automatically selects the smallest type possible when assigning data types to fields that aren’t strictly defined: Sync reads the first row and selects the smallest datatype for each column. Then it reads the next row and ensures that the data still fits in those data types. If not, the data type size is increased. Sync does this all the way up to the RowScanDepth (50 or 100 rows) and at the end, it has the data types.

For example, in a source like CSV, Sync uses “RowScan” to read through the first rows of the file and dynamically determine the datatypes for each column.


In data pipelines, transformations are a way of transforming, cleansing, or aggregating data in way that makes it easy for reporting or data analysis. CData Sync supports two common ways of approaching data transformations when building data pipelines:

ETL: ETL (extract, transform, load) has been the traditional approach in analytics for several decades. It was originally designed to work with relational databases which have historically dominated the market. ETL requires the transformations to happen before the loading process. Data is extracted from data sources and then deposited into a staging area. Data is then cleaned, enriched, transformed and finally loaded into the data warehouse. For more on ETL, refer to the ETL Documentation.

ELT: ELT (Extract, Load, Transform) is a method of data extraction where modifications to the data, such as Transformations, take place after the loading process. The vastly greater storage and scaling capabilities of modern cloud data warehouses allow data to be moved in its entirety and then modified after the fact. The primary difference between ETL and ELT is the order in which the steps take place.

ELT transformations are SQL scripts executed in the destination on your data. Transformations use the processing power of the data warehouse to quickly aggregate, join, and clean your data based on your analytics and reporting needs. By organizing your data with Transformations and Mapping, you can receive your data in the form most useful to you as it moves through your pipeline. Similar to Jobs, Transformations support multiple queries separated by ‘;’, executing on a schedule, and email notifications once complete. For more on ELT, refer to the ELT Documentation.

API Access

CData Sync’s built-in REST API provides a flexible way to manage the application. Everything that can be accomplished within the administration console UI can also be accomplished through RESTful API calls.

The Job Management API allows you to create, update and execute jobs


  • Edit a Job
  • Start a Job
  • Retrieve Job Status
  • Manage Connections

For more information on API Access, refer to the REST-API documentation.

Firewall Traversal

CData Sync can run anywhere. This makes it ideal for customers who have some systems in the cloud and others that reside inside their internal networks. In these scenarios, customers can install Sync to run inside their network, so there is no need to expose ports over the internet or open firewalls, create VPN connections, etc.

Sync’s ability to run anywhere also greatly reduces the amount of latency, as you can run CData Sync close to the source or destination, improving the performance of your ETL or ELT jobs.

Schema Changes

Your data is constantly changing, and Sync makes sure those changes are accurately represented at all times. Every run, Sync compares the source schema to the destination schema to look for differences. If a difference in structure between the two is detected, Sync will modify the destination schema to ensure the source data fits by adding columns or increasing the column size.

Sync can make a number of different changes. These include:

  • If a column is detected in the source table that does not exist in the destination table, Sync will alter the destination table by adding the column.
  • If the datatype in the source increases in size, Sync will alter the destination table by updating the column. This includes increasing the columnsize of a string column (varchar(255) -> varchar(2000)) or the byte size of non-string columns (smallint -> integer).


  • Sync will never delete columns from your destination table if the column was removed from your source table.
  • Sync will never shrink the size of a destination column if the datatype has been updated in the source (varchar(2000) -> varchar(255)).


If you have questions or comments, we’d love to hear them! Go ahead and send us an email.