The PostgreSQL Connector allows you to integrate PostgreSQL into your data flow by pushing or pulling data from various PostgreSQL tables.
Configuring the PostgreSQL Connector follows a three-step process:
- Create (or select) a connection to PostgreSQL
- Select an Action to perform (Select, Upsert, etc)
- Select the Table(s) and columns within the Table(s) to perform the action against
After establishing a connection to the target database, the connector reads the list of tables/views present in the database.
Selecting the Action determines how the PostgreSQL Connector will interact with PostgreSQL data: Select to pull data into CData Arc, Upsert data from Arc to update PostgreSQL, etc. For more information, please see the Actions section.
Selecting the Table(s) and columns determines which specific sets of PostgreSQL data should be selected or updated. For more information please see the Tables and Columns section.
Establishing a Connection
The following connection properties are usually required in order to connect to PostgreSQL:
- Server The host name or IP of the server hosting the PostgreSQL database.
- User The user which will be used to authenticate with the PostgreSQL server.
- Password The password which will be used to authenticate with the PostgreSQL server.
You can also optionally set the following:
- Database The database to connect to when connecting to the PostgreSQL Server. If this is not set, the user’s default database will be used.
- Port The port of the server hosting the PostgreSQL database. 5432 by default.
The connector may provide secure communication with PostgreSQL server using SSL encryption. You can optionally turn on SSL encryption by setting
Use SSL to true in the Advanced tab.
You can also leverage SSL authentication to connect to PostgreSQL data. For that configure the following connection properties:
- SSLClientCert Set this to the name of the certificate store for the client certificate. Used in the case of 2-way SSL, where truststore and keystore are kept on both the client and server machines.
- SSLClientCertPassword If a client certificate store is password-protected, set this value to the store’s password.
- SSLClientCertSubject The subject of the TLS/SSL client certificate. Used to locate the certificate in the store.
- SSLClientCertType The certificate type of the client store.
- SSLServerCert The certificate to be accepted from the server.
The PostgreSQL Connector can interact with PostgreSQL in three ways:
- Select PostgreSQL for data to pull into the Arc flow
- Upsert data from the Arc flow into PostgreSQL
- Look Up values from PostgreSQL to update data within the Arc flow
- Execute a Stored Procedure with input from the flow and pass the result down the flow
The Select action retrieves data from PostgreSQL and brings it into Arc. When the PostgreSQL Connector is set to this Action, it functions as the beginning of an Arc flow.
Each Select is targeted at a specific PostgreSQL table. The PostgreSQL Connector will format data retrieved from PostgreSQL as an XML structure based on the tables and columns selected. These XML documents can be mapped/transformed using the XML Map Connector for further processing in the flow.
The Filter panel allows for adding filters to the Select, which function similarly to ‘WHERE’ clauses in SQL. Use the Add rule button to add new filters, and use Add group to group them together by logical operators (AND, OR).
The Upsert action either inserts new data into PostgreSQL or updates existing data in PostgreSQL. The connector will detect which value corresponds to the primary key for the selected table and automatically determine whether data should be inserted or updated.
When the PostgreSQL Connector is set to this Action, it functions as the end of an Arc Flow. Data that should end up in PostgreSQL should be retrieved, transformed, and formatted prior to the PostgreSQL Connector in the flow.
The Lookup action retrieves a value from PostgreSQL and inserts that value into an already-existing Arc message in the Flow. When the PostgreSQL Connector is set to this Action, it functions as a middle step in an Arc flow.
The Lookup Query determines what value the connector will retrieve from PostgreSQL, and should be formatted as a SQL query against the PostgreSQL tables.
The Lookup action can insert values into existing Arc messages in one of two ways:
- Message Header
When set to Header, the connector will insert the value retrieved from PostgreSQL as a header on the Arc Message. When set to XPath, the connector will insert the value retrieved from PostgreSQL directly into an XML document. The XPath determines which element within the document will contain the value after processing. Note that the XPath option requires that the message processed by the connector is an XML file.
Execute Stored Procedure
The Execute Stored Procedure action accepts data from the flow and uses it as input for the stored procedure call, for which the result is then passed down the flow.
The input parameters for a Stored Procedure are defined by the database system, so no configuration is required in the Mapping Editor to select the relevant input parameters. Whatever parameters are advertised by the system are displayed in the Mapping Editor, and these cannot be de-selected or otherwise configured.
The XML template for input files follows the same principles as XML templates for Input Mappings; files that match the template XML structure will be interpreted as input to the database operation.
Output from a Stored Procedure is not defined in advance by the database system. As a result, it may be necessary to provide a test set of input parameters so that the connector can invoke the procedure and detect the structure of the output.
To provide a set of sample inputs, use the Show Sample Data button. After providing a test set of inputs in the resulting modal, the connector will display the XML structure of the output as it received it. Copy this XML structure or use the Save Sample Data button to use this XML structure in other aspects of the application, such as an XML Map Connector template.
Tables and Columns
After selecting an Action, the connector requires selecting a target table (or tables) within PostgreSQL. The connector will automatically read the available tables and make them available in a dropdown list. The selected Action (Select, Upsert, Lookup) will target the specified table.
Once a table is chosen, individual columns from that table can be excluded by unchecking the checkbox in the columns pane. The connector will automatically detect values that have special relevance (e.g. the primary key, foreign keys, etc).
Tables and Columns as XML
The connector models each table as an XML document with a standard structure. Understanding this XML is important for building appropriate input files and handling output files generated by the connector. Additionally, some advanced configuration options are enabled by modifying the XML model.
More information can be found in the XML Models section.
Advanced Connector Configuration
The Advanced tab within the Connector pane allows for further configuration.
Uncommon settings for advanced configuration.
- Auto Truncate When enabled, the connector will automatically truncate values that are longer than the allowed limit in PostgreSQL when inserting.
- Command Timeout The length of time the connector will wait for a response from a command before throwing a Timeout error.
- Flat Mapping Scheme This setting allows Flat files to be processed for Upserts. After creating an Upsert model in the Tables and Columns panel, this setting associates a specific Upsert model with the names of your Flat files. This setting should be a semi-colon-delimited list of associations between Upsert models and file name patterns (using glob matching). The name of the Upsert model comes from the name of the target Table within that model.
For example, setting this field to companyA*=Invoice would cause Flat files that start with ‘companyA’ in the file name to be processed according to the ‘Invoice’ Upsert model.
- Log Level The verbosity of logs generated by the connector. When requesting support, it is recommended to set this field to Debug.
- Local File Scheme A filemask for determining local file names after they are processed by the connector. The following macros may be used to reference contextual information:
%ConnectorId%, %Filename%, %FilenameNoExt%, %Ext%, %ShortDate%, %LongDate%, %RegexFilename:%, %DateFormat:%, %ControlNumber%, %TransactionControlNumber%, %TransactionCode%, %StandardVersion%.
As an example: %FilenameNoExt%_%ControlNumber%%Ext%
- Log Messages Whether the log entry for a processed file will include a copy of the file itself.
- Send Filter A filemask for determining which files should not be processed by the connector (files that match the glob pattern filemask will be processed).
- Save to Sent Folder Whether files processed by the connector should be copied to the Sent folder for the connector.
- Output File Format The format of the file pushed out of the connector after a Select.
- Batch Input Size The maximum number of input queries to include in the same batch.
- Batch Size The maximum number of Batch Messages to include in a Batch Group.
- Max Records The maximum number of records to include in a single output message. A value of -1 may be used to indicate that all output records should be placed in a single file, and a value of 0 indicates that the connector may decide based on the configured Output File Format (XML will output 1 record per file by default, while flat file formats will include all records in one file by default).
- Transaction Size The maximum number of queries allowed in a transaction.
- Parent Connector The connector from which settings should be inherited, unless explicitly overwritten within the existing connector configuration. Must be set to a connector of the same type as the current connector.
- Log Subfolder Scheme Instructs the connector to group files in the Logs folder according to the selected interval. For example, the Weekly option instructs the connector to create a new subfolder each week and store all logs for the week in that folder. The blank setting tells the connector to save all logs directly in the Logs folder. For connectors that process many transactions, using subfolders can help keep logs organized and improve performance.
Settings for specific use cases.
- Other Settings Allows configuration of hidden connector settings in a semicolon-separated list, like
setting1=value1;setting2=value2. Normal connector use cases and functionality should not require use of these settings.
The PostgreSQL Connector models PostgreSQL tables/views as XML with the following structure: a parent element identifies the table to insert into, and each child element corresponds to a column in the target table. This establishes a connection between XML elements and PostgreSQL values, so that the connector can read values from incoming XML documents and use them in INSERT or UPDATE queries.
The XML Model for a table can be viewed by clicking the ‘Code’ button in the top-right of the configuration pane.
The following XML is a simplified example model for the ‘Account’ table:
<Items> <Account action="upsert"> <Id key="true" /> <account_c /> <AccountNumber /> <AccountSource /> </Account> </Items>
In the above example, Id, account_c, AccountNumber, and AccountSource are each columns of the Account table. When the connector processes an XML file that matches this structure, it will automatically insert the values from that XML into the Account table. When the connector queries the Account table, it will output XML that matches this structure.
The PostgreSQL Connector supports batching to improve performance when inserting large sets of data. Batching is configured via two fields, TransactionSize and BatchInputSize. When these fields are set to positive integers, the connector will insert [BatchInputSize] records in a single operation (batch), and will insert [TransactionSize] records in a single connection (transaction).
An XML input file can contain more records than a single transaction size, in which case the connector will insert the data over multiple transactions.
If an error occurs during a batch insert, the connector will rollback the transaction and cache the index of the first record in the failed transaction. When the input file is retried, only records after the cached index will be inserted into PostgreSQL.
When querying for output, the connector may be configured to support batching records together into a single document. The Max Records setting may be used to govern how many records should be included in a single message, and the Batch Size setting may be used to indicate how many Batch Messages should be included in a single Batch Group. More information on Batch Groups and Batch Messages can be found here.
The PostgreSQL Connector supports Upserting into and Selecting multiple tables with a single action. When creating an table model with multiple tables, the tables should have a foreign key relationship to each other such that one table is the child of another table.
To interface with child tables, first create a model that targets the parent table. Then add child tables from within the configuration panel using the +Add button. Select the child table from the list of tables and the designer will automatically populate with the hierarchical structure of tables. Select the columns of the child table that should be included in the model.
Child Table XML
The Input Mapping XML structure will be multi-layered when inserting into parent and child tables. The following XML is an example Input Mapping for two tables related by a foreign key:
<Account action="upsert"> <Id key="true" /> <account_c /> <AccountNumber /> <AccountSource /> <AccountPartner> <Id key="true" /> <AccountFromId ref="@film_id" /> <IsPrimary /> <OpportunityId /> <Role /> </AccountPartner> </Account>
In the above example, ‘Account’ is the parent table and ‘AccountPartner’ is the child table. When the PostgreSQL Connector processes an input XML file that matches this structure, it will update both the ‘Account’ and ‘AccountPartner’ table and ensure that the keys linking the two records are the same.
Some columns within child tables will be marked with a blue ‘REF’ tag. The connector has detected that these columns correspond to the primary key in the parent table. After making the insert into the parent table, and before making the insert into the child table, the connector will query the parent table for the primary key of the record it just inserted. The connector will then use this value when inserting into the child table.
This is useful when the primary key of the parent table is generated by PostgreSQL. In this case, this column is not passed in as input XML, so the connector does not know the value of the column until after it has INSERTed the record into the parent table.
In the XML model, the ‘ref’ attribute indicates that a column can be referenced later or that the value for a column should be read from another column in the mapping. In order to reference the value of columns that are generated by PostgreSQL, the column element should include a ‘ref’ attribute set to the reserved value LAST_INSERT_ID:
<Item action="upsert"> <Id key="true" upsert="id" ref="@@LAST_INSERT_ID" />
This indicates that the id value will be retrieved after the insert and can be referenced later (usually in a child table). To reference this value later, set the ‘ref’ element to the name of the relevant column:
<id key="true" ref="@id" />
In the above example, id is the name of the primary key in the parent table, and also the name of a column in the child table. The ‘ref’ syntax instructs the connector to retrieve the id from the parent table after inserting, and using that retrieved value as the id column in the child table.
Selecting From Child Tables
Typically, when Selecting from a parent table and child table, records should only be pulled from the child table if a foreign key matches the primary key from the parent table.
As an example, imagine a Select that pulls purchase order data from a parent table, ‘PurchaseOrders’, and a child table, ‘PurchaseOrderLineItems’. Each record (order) pulled from ‘PurchaseOrders’ should also pull the line items associated with that order. The two tables are related by a shared column, for example a PONumber column that is the primary key of the ‘PurchaseOrders’ table and a foreign key in the ‘PurchaseOrdersLineItems’ table. Line item records should only be pulled from the child table if the line item’s PONumber matches the PONumber of the order being pulled from the parent table.
To establish this relationship in Arc, add a new Filter to the Select and configure it as follows:
- In the first (left-hand) dropdown, select the column of the child table that is the foreign key to the parent table. In the above example, this would be the PONumber column of the ‘PurchaseOrderLineItems’ table.
- In the middle (operator) dropdown, select ‘equal’
- In the last (right-hand) input box, first use the dropdown arrow to the right to select ‘REF’. Afterwards, the input box should have the $ prefix, indicating that it is referencing the parent table
- Use the last (right-hand) dropdown to select the parent table column that should match the child table’s foreign key. In the above example, this would be the PONumber column of the ‘PurchaseOrders’ table.
Only Process New or Changed Records
Select Actions can be configured to only retrieve records that have been updated or added recently. The connector can use two approaches for detecting which records are new or modified.
The first approach requires a DateTime column that represents the last-modified time for records, and the second requires a column that explicitly stores a value representing whether a record should be processed or not.
Using a Last Modified Column
Find the Advanced settings at the bottom of the configuration panel. If a DateTime column is detected, the option will be available to Use column for processing new or changed records. Set this to the appropriate DateTime column that represents the last-modified time for records.
Using a Custom Column
Select Actions can be configured to update a column in the table for records that are successfully pulled from PostgreSQL. This can be combined with a Filter rule that only SELECTs records with a column value indicating that it has not yet been pulled (or should be pulled again).
To enable column updating after processing, find Advanced settings at the bottom of the Mapping Editor. Enable Update column with value when rows are processed successfully, then specify which column should be updated and the value that should be set in this column.
For example, an Output Mapping might be configured to update the Processed column to ‘1’ after pulling a record. Then, a Filter rule could be applied that restricts records to only those where Processed not equal 1.
In this example, when the connector processes the Select template, it creates a SELECT statement based on the template. The template would execute “SELECT Id, AccountId, Amount FROM INVOICE”, create an XML file for every record in the result set based on the template, and place those files in the Output folder for this connector.