CData Connector [Deprecated]

Version 23.4.8841


CData Connector [Deprecated]


The CData Connector leverages industry standard connectivity from CData Software, our parent organization. The CData Connector interfaces directly with CData ADO.NET and JDBC drivers, enabling connectivity to hundreds of cloud and on-premise applications, databases, and more.

Overview

The CData Connector uses CData drivers to translate database operations into API calls specific to the target data source. As a result, the cloud application data is available as a set of virtual tables and stored procedures.

Each CData Connector must be configured with the appropriate CData driver to connect to the target data source. Once a connection has been made, the connector supports creating Input Mappings and Output Mappings that determine an XML structure for interacting with the data source’s virtual tables.

After the Input and Output mappings have been generated, XML input files will automatically be converted into API calls to insert data, and data retrieved from the data source via API calls will be pushed out as XML files.

To get started with the CData Connector, the CData driver for the target data source must be installed.

Installing the CData Driver

The ADO.NET provider (Windows edition) or JDBC driver (Java edition) for the appropriate data source must be installed before any connection to the data source can be made.

ADO.NET

Downloading and running the CData installer for an ADO.NET provider is all that is required for the CData Connector to recognize the provider and make it available in the connector dropdown settings.

  • Find the the desired data source in the list of CData data sources. Click on the tile for a data source to bring up the page for that data source.

  • Find the top toolbar that lists the available driver types and click on the ADO.NET tile.

  • Click the Download button on the resulting page.

  • Download the trial (30 days of full functionality) or the full provider (requires a purchased license). Contact the Arc licensing team at [email protected] to arrange for provider licensing.

  • Run the downloaded installer.

  • Restart the CData Arc server to allow the CData Connector to detect the new provider.

JDBC

In order for the CData Connector to recognize a JDBC driver, it must be installed and placed in the lib folder of the Java server hosting Arc.

  • Find the the desired data source in the list of CData data sources. Click on the tile for a data source to bring up the page for that data source.

  • Find the top toolbar that lists the available driver types and click on the JDBC tile.

  • Click the Download button on the resulting page.

  • Download the trial (30 days of full functionality) or the full driver (requires a purchased license). Contact the Arc licensing team at [email protected] to arrange for driver licensing.

  • Extract and run the setup.jar file from the downloaded archive.

  • Navigate to the driver installation directory and find the driver .jar file and .lic (license) file inside the /lib/ folder.

  • Copy the driver .jar and .lic into the /lib/ folder of the Java server hosting Arc.

  • Restart the Arc server to allow the CData Connector to detect the new driver.

Establishing a Connection

Once the appropriate driver/provider is available in the connector, the appropriate connection settings must be supplied to access the application’s API. The connector will automatically populate the settings tab with the connection settings relevant to the selected cloud application data source.

Authentication

Authentication credentials are specific to the target data source. Credentials should be procured directly from the cloud application so that they can be included in the API calls made by the connector.

OAuth Authentication

Some data sources require OAuth authentication, which must be initiated using the Connect button in the connector settings panel. This button will direct the browser to an authentication portal for the data source, where the CData driver must be given permissions to access the data source. Credentials entered in this portal are not seen or stored by Arc (the portal returns an authentication token representing the credentials without exposing them).

Arc does store the OAuth token and refresh token required to keep the OAuth authentication current, so providing credentials through the OAuth portal is only required for the first time connecting to the data source.

Input Mappings

The CData Connector represents the data stored in the target cloud application as though the data were stored in tables in a relational database. After connecting to the data source, the connector scans the data source for the available virtual tables and stored procedures. An Input Mapping represents an INSERT/UPDATE into one of these virtual tables.

Input Mappings are represented as XML, and files processed by the CData Connector that match the XML structure of an Input Mapping will automatically be converted into the appropriate INSERT or UPDATE queries against the target data source.

The process of converting XML to database queries is detailed in the Input Mappings as XML section; conceptually, the connector is simply reading values from XML elements and using them when making INSERT/UPDATE statements.

Creating an Input Mapping

Click the ‘Add mapping’ button (+) to create a new mapping. Select the target table/view from the list of available virtual tables in the data source to bring up the Mapping Editor panel. In the Mapping Editor, select the virtual table columns that should be included in the INSERT/UPDATE. For simple INSERTs, this is all that is required to generate a mapping.

After an Input Mapping is created, its XML representation can be viewed by clicking the ‘Code’ button (</>) button next to the mapping name.

UPSERT

UPSERT settings govern how the connector decides whether to INSERT or UPDATE. The most common case is to set UPSERT by to the primary key of the table. With this configuration, the connector reads the primary key value from the input XML, queries the database to see if this key already exists in the table, updates the record if it does exist, and inserts a new record if it does not. UPSERT by can be set to columns other than the primary key, but this might result in multiple records being returned from the database. In this case, the connector updates the first record that is returned.

If Perform this query to select the UPSERT key is set to a SELECT query, the connector compares the results of this query to the input XML to determine whether to INSERT or UDPATE. This is useful when the primary key for the target table is not available in the input XML, but can be fetched by referencing another column.

For example, imagine that the primary key CustomerID needs to be used to perform UPSERT logic. The input XML does not have the CustomerID, but it does have a CustomerPhoneNumber that uniquely identifies each customer. The UPSERT query for CustomerID could be set to the following:

SELECT CustomerID FROM Customers WHERE CustomerPhoneNumber = @CustomerPhoneNumber

The ‘@’ syntax in the above query indicates that the value should be read from the CustomerPhoneNumber element in the input XML.

The connector runs this query against the database, and returns a CustomerID value if a record already exists with the same CustomerPhoneNumber. Then, this CustomerID is used to UPDATE a record (otherwise a new record is INSERTed).

INSERT

If UPSERT is disabled in the Input Mapping, the connector will automatically set the query action to INSERT. The connector will attempt to INSERT new records for each input XML file without querying the data source to see if the record already exists.

LOOKUP

Some columns have a LOOKUP option. The connector has detected that these columns have a foreign key relationship with another table in the database. If LOOKUP is enabled, the connector queries this external table prior to INSERTing, and uses the value that is returned in the INSERT. The LOOKUP by field determines which column is used to query the external table.

For example, imagine INSERTing a Purchase Order (PO) into a database where one of the fields in the ‘PO’ table is an internal CustomerID. Imagine that the input XML has a CustomerName element, but not the customer’s internal ID. However, the ‘Customers’ table in the database has both the name and ID, so it can be used to lookup the ID value based on the name. The connector recognizes that the CustomerID column of the ‘PO’ table is a foreign key, and enables LOOKUP for that column. Setting LOOKUP by to CustomerName instructs the connector to query the ‘Customers’ table with a statement like this:

SELECT CustomerID FROM Customers WHERE CustomerName = @CustomerName

The ‘@’ syntax in the above query indicates that the value should be read from the CustomerName element in the input XML.

With this configuration, the connector first retrieves the CustomerID from ‘Customers’, then uses that value when INSERTing into the ‘PO’ table.

Input Mappings as XML

Input Mappings are represented in XML, with the following structure: a parent element identifies the virtual table to insert into, and each child element corresponds to a column in the target table. The Input Mapping establishes the connection between XML elements and data source columns, so that the connector can read values from incoming XML documents and use them in INSERT or UPDATE queries.

The following XML is an example Input Mapping for a ‘Students’ table:

<Items>
    <Students action="insert">
        <StudentID key="true" />
        <Name/>
        <Grade type="int"/>
        <GPA/>
    </Students>
</Items>

In the above example, StudentID, Name, Grade, and GPA are each columns of the ‘Students’ table. When the connector processes an XML file that matches this structure, it will automatically insert the values from that XML into the ‘Students’ table. The following is an example XML input file for the above mapping:

<Items>
    <Students>
        <StudentID>12554</StudentID>
        <Name>Ferris Bueller</Name>
        <Year>11</Year>
        <GPA>1.8</GPA>
    </Students>
    <Students>
        <StudentID>12921</StudentID>
        <Name>Hermione Granger</Name>
        <Year>9</Year>
        <GPA>4.0</GPA>
    </Students>
</Items>

When the connector processes the above input file, it will INSERT or UPDATE two records in the ‘Students’ table, one for each Students element.

Batching Input

The CData Connector supports batching to improve performance when inserting large sets of data. Batching is configured via two fields, TransactionSize and BatchInputSize. When these fields are set to positive integers, the connector will insert [BatchInputSize] records in a single operation (batch), and will insert [TransactionSize] records in a single connection (transaction).

An XML input file can contain more records than a single transaction size, in which case the connector will insert the data over multiple transactions.

If an error occurs during a batch insert, the connector will rollback the transaction and cache the index of the first record in the failed transaction. When the input file is retried, only records after the cached index will be inserted into the data source.

UPDATE Action

By default, Input Mappings will use INSERT or UPSERT as the query action. The action associated with an Input Mapping is displayed in the XML as an ‘action’ attribute of the table_name element, for example:

<Students action="insert">

Some cloud application data sources support better performance when the action is set to UPDATE. UPDATE actions assume that the data inserted into the data source is associated with a record that already exists. When this is true, it may be desirable to try setting the ‘action’ attribute to UPDATE in an attempt to leverage the better back-end performance:

<Students action="update">

Note that the Amazon Dynamo DB data source is a special case where the back-end data processing includes UPSERT logic (in other words, an UPDATE call to a record that does not exist will result in a new record being created). Setting the ‘action’ attribute to UPDATE is recommended when inserting data into Amazon Dynamo DB to capitalize on the performance benefits.

INSERTing into Child Tables

Input Mappings support INSERTing into multiple virtual tables with a single mapping. When creating an Input Mapping with multiple tables, the tables should have a foreign key relationship to each other such that one table is the child of another table. Otherwise, use multiple Input Mappings to insert into tables that are not related through a foreign key.

To INSERT into child tables, first create an Input Mapping that targets the parent table. Then add child tables from within the Input Mapping Editor using the +Add button in the upper-left. Select the appropriate table from the list of tables and the designer will automatically populate with the hierarchical structure of tables. Select the columns of the child table that should be included in the Input Mapping.

REF

Some columns within child tables will be marked with a blue ‘REF’ tag. The connector has detected that these columns correspond to the primary key in the parent table. After making the insert into the parent table, and before making the insert into the child table, the connector will query the parent table for the primary key of the record it just inserted. The connector will then use this value when inserting into the child table.

This is useful when the primary key of the parent table is generated by the data source, such as an AUTOINCREMENT key. In this case, this column is not passed in as input XML, so the connector does not know the value of the column until after it has INSERTed the record into the parent table.

In the Input Mapping XML, the ‘ref’ attribute indicates that a column can be referenced later or that the value for a column should be read from another column in the mapping. In order to reference the value of columns that are generated by the database (e.g. AUTOINCREMENT primary keys), the column element should include a ‘ref’ attribute set to the reserved value LAST_INSERT_ID:

<film action="upsert">
    <film_id key="true" upsert="film_id" ref="@@LAST_INSERT_ID" />

This indicates that the film_id value will be retrieved after the insert and can be referenced later in the mapping (usually in a child table). To reference this value later, set the ‘ref’ element to the name of the AUTOINCREMENT column:

<film_id key="true" ref="@film_id" />

In the above example, film_id is the name of the primary key in the parent table, and also the name of a column in the child table. The ‘ref’ syntax instructs the connector to retrieve the film_id from the parent table after inserting, and using that retrieved value as the film_id column in the child table.

LOOKUP

LOOKUP logic for child tables is the same as parent tables. Please see the LOOKUP section in Creating an Input Mapping for more details.

Child Table XML

The Input Mapping XML structure will be multi-layered when inserting into parent and child tables. The following XML is an example Input Mapping for two tables related by a foreign key:

<film action="upsert">
    <film_id key="true" upsert="film_id" ref="@@LAST_INSERT_ID" />
    <description />
    <language_id />
    <rating />
    <release_year type="date" />
    <title />
    <film_actor>
        <actor_id key="true" />
        <film_id key="true" ref="@film_id" />
        <last_update type="datetime" />
    </film_actor>
</film>

In the above example, ‘film’ is the parent table and ‘film_actor’ is the child table. The film_id column is the primary key to the ‘film’ table, and a foreign key in the ‘film_actor’ table. When the CData Connector processes an input XML file that matches this structure, it will update both the ‘film’ and ‘film_actor’ table and ensure that the keys linking the two records are the same.

Output Mappings

The CData Connector represents the data stored in the target cloud application as though the data were stored in tables in a relational database. After connecting to the data source, the connector scans the data source for the available virtual tables and stored procedures. An Output Mapping represents a SELECT from one of these virtual tables.

Output Mappings are represented as XML, and when the CData Connector pulls data from the target data source, that data will be pushed out as XML files. These output files will have the same XML structure as the Output Mapping. The process of converting database responses to XML is detailed in the Output Mappings as XML section.

Creating an Output Mapping

Click the ‘Add mapping’ button (+) to create a new mapping. Select the target virtual table/view from the list of available tables in the data source to bring up the Mapping Editor panel. In the Mapping Editor, select the table columns that should be returned in the SELECT statement. For simple SELECTs, this is all that is required to generate a mapping.

After an Input Mapping is created, the query it represents can be executed by clicking the ‘Execute’ button (a black play button) next to the mapping name. Additionally, the mapping’s XML representation can be viewed by clicking the ‘Code’ button (</>) button next to the mapping name.

Output Filters

The Mapping Editor includes a ‘Filters’ panel to define rules that restrict the records that are returned by the data source. Each rule corresponds to a WHERE statement in the resulting SELECT query, and compares a specified column against a specified value; only records with a matching column value will be pulled.

Rules are combined with AND and OR logical operators, and rules can be grouped together to preserve order-of-operations when many rules are applied.

Output Mappings as XML

Output Mappings are represented in XML, with the following structure: a parent element identifies the virtual table to SELECT from, and each child element corresponds to a column in the target table. The Output Mapping establishes the connection between database columns and XML elements, so that the connector can read values from database results and use them to populate XML documents.

The following XML is an example Output Mapping for a ‘Students’ table:

<Items>
    <Students selectQuery="SELECT * FROM `Students` WHERE `GPA` = 4.0">
        <StudentID key="true" />
        <Name/>
        <Grade type="int"/>
        <GPA/>
    </Students>
</Items>

In the above example, StudentID, Name, Grade, and GPA are each columns of the ‘Students’ table. When the Output Mapping is executed, the connector will run the specified selectQuery and generate an XML document for each record that is returned. The following is an example XML Output file for the above mapping:

<Items>
    <Students>
        <StudentID>12921</StudentID>
        <Name>Hermione Granger</Name>
        <Year>9</Year>
        <GPA>4.0</GPA>
    </Students>
</Items>

Batching Output

When querying for output, you can configure the connector to support batching records together into a single document. The Max Records setting governs how many records should be included in a single message, and the Batch Size setting indicates how many Batch Messages should be included in a single Batch Group.

Overwriting the SELECT Query

The Output Mapping XML displays the SELECT query that will be run against the target data source. The connector builds this query based on the settings in the Output Mapping, but the query can be manually overwritten if fine-grain control over the query is required.

Simply edit the ‘selectQuery’ attribute of the element that shares a name with the virtual table and set it to any arbitrary SELECT query.

Only Process New or Changed Records

Output Mappings can be configured to only retrieve records that have been updated or added recently. The connector can use two approaches for detecting which records are new or modified.

The first approach requires a DateTime column that represents the last-modified time for records, and the second requires a column that explicitly stores a value representing whether a record should be processed or not.

Using a Last Modified Column

In the Mapping Editor, expand the Advanced settings at the bottom of the panel. If a DateTime column is detected, the option will be available to Use column for processing new or changed records. Set this to the appropriate DateTime column that represents the last-modified time for records.

Using a Custom Column

Output Mappings can be configured to update a column in the virtual table for records that are successfully pulled from the data source. This can be combined with a Filter rule that only SELECTs records with a column value indicating that it has not yet been pulled (or should be pulled again).

To enable column updating after processing, expand the Advanced settings at the bottom of the Mapping Editor. Enable Update column with value when rows are processed successfully, then specify which column should be updated and the value that should be set in this column.

For example, an Output Mapping might be configured to update the Processed column to ‘1’ after pulling a record. Then, a Filter rule could be applied that restricts records to only those where Processed not equal 1.

In this example, when the connector processes the output template, it creates a SELECT statement based on the template. The template would execute “SELECT Id, AccountId, Amount FROM INVOICE”, create an XML file for every record in the result set based on the template, and place those files in the Output folder for this connector.

SELECTing From Multiple Tables

Output Mappings support querying multiple tables with a single mapping. When creating an Output Mapping with multiple tables, the tables should have a foreign key relationship to each other such that one table is the child of another table. Otherwise, use multiple Output Mappings to SELECT from tables that are not related through a foreign key.

To query child tables, first create an Output Mapping that targets the parent table. Then add child tables from within the Output Mapping Editor using the +Add button in the upper-left corner. Select the appropriate table from the list of tables and the designer will automatically populate with the hierarchical structure of tables. Select the columns of the child table that should be included in the Output Mapping.

Next, a WHERE filter should be applied to the child table mapping to reflect the foreign key relationship with the parent table.

Parent-Child Foreign Key Relationships

Typically, When SELECTing from a parent table and child table, records should only be pulled from the child table if a foreign key matches the primary key from the parent table.

As an example, imagine an Output Mapping that pulls purchase order data from a parent table, ‘PurchaseOrders’, and a child table, ‘PurchaseOrderLineItems’. Each record (order) pulled from ‘PurchaseOrders’ should also pull the line items associated with that order. The two tables are related by a shared column, for example a PONumber column that is the primary key of the ‘PurchaseOrders’ table and a foreign key in the ‘PurchaseOrdersLineItems’ table. Line item records should only be pulled from the child table if the line item’s PONumber matches the PONumber of the order being pulled from the parent table.

To establish this relationship in an Arc Output Mapping, add a new Filter to the child table mapping and configure it as follows:

  • In the first (left-hand) dropdown, select the column of the child table that is the foreign key to the parent table. In the above example, this would be the PONumber column of the ‘PurchaseOrderLineItems’ table.
  • In the middle (operator) dropdown, select ‘equal’
  • In the last (right-hand) input box, first use the dropdown arrow to the right to select ‘REF’. Afterwards, the input box should have the $ prefix, indicating that it is referencing the parent table
  • Use the last (right-hand) dropdown to select the parent table column that should match the child table’s foreign key. In the above example, this would be the PONumber column of the ‘PurchaseOrders’ table.

Parent-Child Output Mapping XML

The XML representation of an Output Mapping that includes a parent and a child table has a hierarchical structure to reflect the parent-child relationship. Both the query to SELECT from the parent table and the query to SELECT from the child table will be included in the XML:

<PuchaseOrders selectQuery="SELECT * FROM `PurchaseOrders` ">
  <PONumber key="true"/>
  <AccountId/>
  <Amount/>
  <AppliedAmount/>
  <Balance/>
  <BillingCity/>
  <BillingCountry/>
  <BillingLine1/>
  <BillingPostalCode/>
  <BillingState/>
  <InvoiceLineItem selectQuery="SELECT * FROM `PurchaseOrderLineItems` WHERE `PONumber` = ${PONumber}">
    <Id key="true"/>
    <PONumber/>
    <ItemAmount/>
    <ItemName/>
    <ItemQuantity/>
  </InvoiceLineItem>
</PurchaseOrders>

The ‘${PONumber}’ syntax in the child table’s query indicates that the value is referencing a column (the PONumber column) from the parent table.

If multiple records are pulled from the ‘PurchaseOrderLineItems’ table, the InvoiceLineItem element (and all of its children) will appear multiple times in the output XML.

Input/Output Mappings

Some database operations require input and produce output. Input/Output Mappings provide a way to set an XML schema for both input files and output files for the same database operation.

Creating an Input/Output Mapping

Input/Output Mappings are created using the same approach as Input Mappings and Output Mappings. Each mapping targets a specific Stored Procedure, and the Mapping Editor wizard lets you view the expected parameters for the Stored Procedure.

Input Parameters

The input parameters for a Stored Procedure are defined by the database system, so no Mapping Editor configuration is required to select the relevant input parameters. The parameters advertised by the system are displayed in the Mapping Editor, and these cannot be de-selected or otherwise configured.

Input/Output Mappings as XML

Once an Input/Output Mapping has been created, an XML template is generated to match the required input parameters, and an XML template is generated to handle the output from the Stored Procedure.

Input XML

The XML template for input files follows the same principles as XML templates for Input Mappings; files that match the template XML structure are interpreted as input to the database operation. See Input Mappings as XML for more details.

Output XML

Output from a Stored Procedure is not defined in advance by the database system. As a result, it might be necessary to provide a test set of input parameters so the connector can invoke the procedure and detect the structure of the output.

To provide a set of sample inputs, use the Show Sample Data button in the top right of the Mapping Editor. After you provide a test set of inputs in the resulting modal, the connector displays the XML structure of the output as it received it. Copy this XML structure or use the Save Sample Data button to use this XML structure in other areas of the application, such as an XML Map Connector template.

Troubleshooting

This section contains information helpful in troubleshooting errors in the CData Connector. The first step is to understand at a technical level the two layers of translation that the CData Connector performs.

Technical Overview: The Two Layers

The CData Connector in Arc provides an interface between an XML document mapping and a query that is issued by the configured data driver. There are two layers of integration that take place with the data source configured in the CData connector.

At the first layer, the CData Connector creates an interface between an input or output mapping (defined in the setting tab) and one or more queries (e.g. SQL queries) that are issued with the data driver.

At the second later, the CData driver attempts to process this query and translate it into an API call that is directly supported by the data source (some data sources use interfaces other than an API, in which case the driver performs the specific translation needed to communicate with that interface).

Troubleshooting errors in the CData connector involves an understanding of both layers - if an insert query fails because a required element is missing, for example, that would be an example of failure occurring at the first layer. In an input element is rejected because a required input validation check on a column fails, that is an indication of a failing at the second layer.

Troubleshooting the First Layer (Generating Queries)

The logs that are generated based in the Input and Output tab of the CData connector will log the interaction between the mapping and the underlying data driver.

For an input mapping, you will see that the connector begins by loading the connection to the data source and resolving every input table in the incoming document (every XML element that contains children) against a corresponding mapping definition from the loaded input mappings.

Here, you will see that every detected element is matched against a table, and that table is resolved to a query with the underlying driver. From these logs, you will be able to determine what calls are made to the driver and in what sequence. The queries that are logged will be parameterized, so if you are missing any input columns or queries, this will be apparent in the logs that are recorded in the connector.

For an output mapping, you will see that every query that is issued by the connector, and where child queries are passed a parameter from a parent query, the parameter that is passed.

Troubleshooting the Second Layer (Translating/Executing Queries)

Often, you will need to troubleshoot the interaction between the data driver and the data source. This become necessary whenever the query appears to be formed correctly, but an error message is returned that appears to come direct from the data source.

This can also be helpful when troubleshooting a connection to a data source, or when troubleshooting metadata queries (such as errors that appear when listing tables or table columns).

Logging for the driver is handled separately from the logging that is supported for the CData connector. In order to enable to log file for the underlying driver in CData, you will need to set the log file as a parameter in the connection string of the driver. In general, any Connection String properties that are supported for the driver will be listed in the online documentation for the data driver at the CData website

In almost all cases, a connection string setting of LogFile is supported, and a path to a log file on disk can be set in the Other Driver Settings tab in the Advanced tab for the connector. This setting can be used to extend the connection string assembled for the driver by the CData connector, and can accept any additional parameters in name=value pairs, semicolon separated, i.e.:

LogFile=C:\temp\CDataLog.txt;Verbosity=5;foo=bar

This log will pick up detailed information about the interface between the CData driver and the data source (including the queries that are issued in the API natively supported by the driver).