Apache Kafka - CData Cloud

Overview

CData Cloud offers access to Apache Kafka across several standard services and protocols, in a cloud-hosted solution. Any application that can connect to a SQL Server database can connect to Apache Kafka through CData Cloud.

CData Cloud allows you to standardize and configure connections to Apache Kafka as though it were any other OData endpoint or standard SQL Server.

Key Features

Full SQL Support: Apache Kafka appears as standard relational databases, allowing you to perform operations - Filter, Group, Join, etc. - using standard SQL, regardless of whether these operations are supported by the underlying API.
CRUD Support: Both read and write operations are supported, restricted only by security settings that you can configure in Cloud or downstream in the source itself.
Secure Access: The administrator can create users and define their access to specific databases and read-only operations or grant full read & write privileges.
Comprehensive Data Model & Dynamic Discovery: CData Cloud provides comprehensive access to all of the data exposed in the underlying data source, including full access to dynamic data and easily searchable metadata.

CData Cloud

Getting Started

This page provides a guide to Establishing a Connection to Apache Kafka in CData Cloud, as well as information on the available resources, and a reference to the available connection properties.

Connecting to Apache Kafka

Establishing a Connection shows how to authenticate to Apache Kafka and configure any necessary connection properties to create a database in CData Cloud

Accessing Data from CData Cloud Services

Accessing data from Apache Kafka through the available standard services and CData Cloud administration is documented in further details in the CData Cloud Documentation.

CData Cloud

Establishing a Connection

Connect to Apache Kafka by selecting the corresponding icon in the Database tab. Required properties are listed under Settings. The Advanced tab lists connection properties that are not typically required.

Connecting to Apache Kafka

.NET-based editions rely on the Confluent.Kafka and librdkafka libraries to function. These assemblies are bundled with the installer and automatically installed alongside the Cloud. If you are using a different installation method, make sure to install Confluent.Kafka 2.10.0 from NuGet along with its dependencies.

To specify the address of your Apache Kafka server, use the BootstrapServers parameter.

By default, the Cloud communicates with the data source in PLAINTEXT, which means that all data is sent unencrypted. If you want communication to be encrypted:

Configure the Cloud to use SSL encryption by setting UseSSL to true.
Configure SSLServerCert and SSLServerCertType to load the server certificates.

Note: Proxy settings like ProxyServer and firewall settings like FirewallServer do not affect the connection to the Apache Kafka broker because the Cloud connects to Apache Kafka interally using the official libraries, which do not support proxies. These options are only used when the Cloud connects to the schema registry. For details, see Extracting Metadata From Topics.

Authenticating to Apache Kafka

The Apache Kafka data source supports the following authentication methods:

Anonymous
Plain
SCRAM login module
SSL client certificate
Kerberos

Anonymous

Certain on-premise deployments of Apache Kafka can connect to Apache Kafka without the need to set any authentication connection properties. Such connections are said to be anonymous.

To authenticate anonymously, set this property:

AuthScheme: None.

SASL Plain

Plain authentication employs a plain text login module for authentication.

Set these properties:

AuthScheme: Plain.
User: The authenticating user.
Password: The authenticating user's password.

SCRAM Login Module

To authenticate using a SCRAM login module, set these properties:

AuthScheme: Specify SCRAM to use the SCRAM login module with SHA-256 hashing, or SCRAM-SHA-512 to use the SCRAM login module with SHA-512 hashing.
User: The authenticating user.
Password: The authenticating user's password.

SSL Client Certificate

To authenticate using an SSL client certificate, set these properties:

AuthScheme: SSLCertificate.
SSLClientCert: The SSL client certificate used to connect to the CData Cloud - Apache Kafka broker.
SSLClientCertType: The format of the SSL client certificate used to connect to the CData Cloud - Apache Kafka broker: PEMKEY_FILE (default), JKSFILE, or PMKEY_BLOB.

Kerberos

Authenticating via Kerberos requires you to specify the system Kerberos configuration file. Set these properties:

AuthScheme: KERBEROS.
KerberosServiceName: The principal name of the Kafka brokers. For example, if the principal is kafka/[email protected], the KerberosServiceName is kkafka.

Kafka OpenID Connect

The Cloud supports a limited form of OAuth provided by the Apache Kafka client libraries. Unlike the authentication schemes connected to Azure Event Hubs and GCP Kakfa, this authentication scheme is not tied to a specific identity provider (IdP). It only requires that your IdP support the client credentials grant type and use JWTs as its access token format. If your IdP meets these requirements, you can register an OAuth application with it for use with the Cloud.

Note: Both your IdP and your Apache Kafka broker can be configured in a wide variety of ways. You should be aware of the requirements on both ends when connecting with the Cloud. For further information, refer to the documentation for your Apache Kafka broker and your IdP.

Identity Provider: IdPs generally require registering before an application can authenticate. Registering a client credentials application provides you with a client ID, client secret, and a token URL that accepts these values. Your IdP may also require specific scopes to generate an access token that the Apache Kafka broker accepts.
Apache Kafka Broker: The Apache Kafka broker supports several options for validating access tokens. The most important of these for the Cloud are the audience, issuer, and signing key that apply to the access token. If the access token does not have a matching audience or issuer, or is signed with an unrecognized key, the broker rejects the connection. In this situation you should check the broker logs to determine why the validation failed. The broker does not tell the Cloud why the validation failed so client logs are of limited use.

Set these properties to authenticate:

AuthScheme: KafkaOAuthClient.
OAuthAccessTokenURL: The URL where your IdP accepts OAuth token requests.
OAuthClientId: The client ID of the application you registered with your IdP.
OAuthClientSecret: The client secret of the application you registered with your IdP.
Scope (optional): If your IdP requires specific scopes to retrieve a valid access token, set them here.

Connecting to Azure Event Hubs

The Cloud supports connecting to Azure Event Hubs using OAuth and shared-access signatures. Before you begin, check that your Event Hubs namespace supports connections using the Kafka protocol. The Cloud requires this feature and it may not be available for certain pricing tiers.

In addition to the scheme-specific properties covered below, all connections to Azure require you to set the following properties:

BootstrapServers: mynamespace.servicebus.windows.net:9093.
UseSSL: True.

Entra ID (Azure AD)

Microsoft Entra ID is a multi-tenant, cloud-based identity and access management platform. It supports OAuth-based authentication flows that enable the driver to access Apache Kafka endpoints securely.

Authentication to Entra ID via a web application always requires that you first create and register a custom OAuth application. This enables your application to define its own redirect URI, manage credential scope, and comply with organization-specific security policies.

For full instructions on how to create and register a custom OAuth application, see Creating an Entra ID (Azure AD) Application.

After setting AuthScheme to AzureAD, the steps to authenticate vary, depending on the environment. For details on how to connect from desktop applications, web-based workflows, or headless systems, see the following sections.

Azure Service Principal

Note: Microsoft has rebranded Azure AD as Entra ID. In topics that require the user to interact with the Entra ID Admin site, we use the same names Microsoft does. However, there are still CData connection properties whose names or values reference "Azure AD".

Azure Service Principal is role-based application-based authentication. This means that authentication is done per application, rather than per user. All tasks taken on by the application are executed without a default user context, but based on the assigned roles. The application access to the resources is controlled through the assigned roles' permissions.

For information about how to set up Azure Service Principal authentication, see Creating a Service Principal App in Entra ID (Azure AD).

Managed Service Identity (MSI)

If you are running Apache Kafka on an Azure VM and want to automatically obtain Managed Service Identity (MSI) credentials to connect, set AuthScheme to AzureMSI.

User-Managed Identities

To obtain a token for a managed identity, use the OAuthClientId property to specify the managed identity's client_id.

If your VM has multiple user-assigned managed identities, you must also specify OAuthClientId.

Shared-Access Signature

The Cloud supports password-based authentication using shared-access signatures. After you create the shared secret, set these properties:

AuthScheme: Plain.
User: $ConnectionString.
Password: The Event Hubs connection string from the Shared Access Policies screen.

Connecting to GCP Kafka

The Cloud supports connecting to Google Managed Service for Apache Kafka (GCP Kafka). GCP Kafka uses OAuth authentication and supports service accounts, GCP instance accounts, and Workload Identity Federation.

All connections to GCP Kafka must set these properties:

BootstrapServers: bootstrap.myclustername.myregion.managedkafka.mygcpproject.cloud.goog:9092. This value is listed on the Cluster Configuration page, under the Configurations tab.
UseSSL: True.

After you set the appropriate scheme-specific properties (described below), you are ready to connect.

Authenticating to GCP Kafka

You can authenticate to GCP Kafka as a Google service account, a GCP instance account, or using Workload Identity Federation credentials.

Service Account

GCP Kafka supports authenticating as a Google service account. This service account must have the Managed Kafka Client role.

Provide the service account credentials to the Cloud with these properties:

AuthScheme: OAuthJWT.
OAuthJWTCertType: GOOGLEJSON.
OAuthJWTCert: The path of the JSON file containing the service account credentials.

GCP Instance Account

GCP Kafka supports connections using GCP instance accounts. This requires your Compute Engine instance to have a service account with the Managed Kafka Client role. The instance must also enable the Cloud Platform scope within the Cloud API Access Scopes.

To connect using a GCP instance account, set this property:

AuthScheme: GCPInstanceAccount.

Workload Identity Federation Credentials

GCP Kafka supports connections using Workload Identity Federation credentials. However, it only supports these accounts via delegation with the RequestingServiceAccount property. As with normal service accounts, the delegated service account must have the Managed Kafka Client role.

To connect using Workload Identity Federation credentials, set these properties:

AuthScheme: AWSWorkloadIdentity.
AWSWorkloadIdentityConfig: Various, this depends on how you authenticate to AWS.
RequestingServiceAccount: The email of the service account to which the AWS principal can delegate.

CData Cloud

SSL Configuration

Customizing the SSL Configuration

To enable TLS, set UseSSL to True.

With this configuration, the Cloud attempts to negotiate TLS with the server. The server certificate is validated against the default system trusted certificate store. You can override how the certificate gets validated using the SSLServerCert connection property.

To specify another certificate, see the SSLServerCert connection property.

Client SSL Certificates

The Apache Kafka Cloud also supports setting client certificates. Set the following to connect using a client certificate.

SSLClientCert: The name of the certificate store for the client certificate.
SSLClientCertType: The type of key store containing the TLS/SSL client certificate.
SSLClientCertPassword: The password for the TLS/SSL client certificate.
SSLClientCertSubject: The subject of the TLS/SSL client certificate.

CData Cloud

Firewall and Proxy

Connecting Through a Firewall or Proxy

HTTP Proxies

To authenticate to an HTTP proxy, set the following:

ProxyServer: the hostname or IP address of the proxy server that you want to route HTTP traffic through.
ProxyPort: the TCP port that the proxy server is running on.
ProxyAuthScheme: the authentication method the Cloud uses when authenticating to the proxy server.
ProxyUser: the username of a user account registered with the proxy server.
ProxyPassword: the password associated with the ProxyUser.

Other Proxies

Set the following properties:

To use a proxy-based firewall, set FirewallType, FirewallServer, and FirewallPort.
To tunnel the connection, set FirewallType to TUNNEL.
To authenticate, specify FirewallUser and FirewallPassword.
To authenticate to a SOCKS proxy, additionally set FirewallType to SOCKS5.

CData Cloud

Data Model

Tables

The CData Cloud dynamically models Apache Kafka topics as tables. A complete list of discovered topics can be obtained from the sys_tables system table.

SELECTing from a topic returns existing messages on the topic, as well as live messages posted before the number of seconds specified by the ReadDuration have elapsed.

Stored Procedures

Stored Procedures are function-like interfaces to Apache Kafka. They can be used to create schema files, commit messages, and more.

Consumer Groups

Connections that the Cloud makes to Apache Kafka are always part of a consumer group. You can control the consumer group by setting a value for the ConsumerGroupId connection property. Using the same consumer group ID across multiple connections puts those connections into the same consumer group. The Cloud generates a random consumer group ID if one is not provided.

All members of a consumer group share an offset that determines what messages are read next within each topic and partition. The Cloud supports two ways of updating the offset:

If AutoCommit is enabled, the Cloud periodically commits the offset for any topics and partitions that have been read by SELECT queries. The exact interval is determined by the auto-commit properties in the native library. See ConsumerProperties for details on how to configure these properties.
The CommitOffset stored procedure stores the offset of the last item read by the current query. Note that this must be called while the query resultset is still open. The Cloud resets the offset when the resultset is closed.

If there is no existing offset, the Cloud uses the OffsetResetStrategy to determine what the offset should be. This may happen if the broker does not recognize the consumer group or if the consumer group never committed an offset.

Bulk Messages

The Cloud supports reading bulk messages from topics using the CSV, JSON, or XML SerializationFormat. When the Cloud reads CSV data like the following block, it splits the CSV and outputs each line as a separate row. The values of other columns like the partition, timestamp, and key are the same across each row.

"1","alpha"
"2","beta"
"3","gamma"

Bulk messages are not supported for key values. When MessageKeyType is set to a bulk format, the Cloud reads only the first row of the key and ignore the rest. For example, when the Cloud reads the above CSV data as a message key, the entries on the alpha row are repeated across every bulk row from the message value. The entries on the beta and gamma rows are lost.

Bulk Limitations

Apache Kafka does not natively support bulk messages, which can lead to rows being skipped in some circumstances. For example:

A Cloud connection is created with ConsumerGroupId=x
The connection executes the query SELECT * FROM topic LIMIT 3.
The connection commits its offset and closes.
Another connection is created with the same ConsumerGroupId
The connection executes the query SELECT * FROM topic.

Consider what happens if this procedure is performed on the following topic. The first connection consumes all rows from the first message and one row from the second. However, the Cloud has no way to report to Apache Kafka that only part of the second message was read. This means that step 3 commits the offset 3 and the second connection starts on row 5, skipping row 4.

"row 1"
"row 2"
/* End of message 1 */

"row 3"
"row 4"
/* End of message 2 */

"row 5"
"row 6"
/* End of message 3 */

CData Cloud

Stored Procedures

Stored procedures are function-like interfaces that extend the functionality of the Cloud beyond simple SELECT/INSERT operations with Apache Kafka.

Stored procedures accept a list of parameters, perform their intended function, and then return any relevant response data from Apache Kafka, along with an indication of whether the procedure succeeded or failed.

CData Cloud - Apache Kafka Stored Procedures


Name	Description
CommitOffset	Commits the current set of message offsets for a specified consumer group. This stored procedure ensures that consumed messages are marked as processed within Kafka, preventing the same records from being read again when the consumer resumes operation.
CreateTopic	Creates a new topic on the Kafka broker. This stored procedure allows you to define key topic parameters (for example, the number of partitions and replication factor) and initialize new message streams for publishing and consumption.
DeleteTopic	Deletes an existing topic from the Kafka broker. This stored procedure permanently removes the topic and all of its messages, so it should be used with caution in production environments.
GetAdminConsentURL.rsb	Specifies any additional parameters that are required by the authorization request. These parameters are passed directly to the OAuth authorization endpoint.
ProduceMessage	Publishes a raw message to a specified Kafka topic. This stored procedure enables data producers to send serialized message payloads (for example, JSON or Avro) directly into the message stream for downstream consumers to process.

CData Cloud

CommitOffset

Commits the current set of message offsets for a specified consumer group. This stored procedure ensures that consumed messages are marked as processed within Kafka, preventing the same records from being read again when the consumer resumes operation.

Result Set Columns


Name	Type	Description
Success	String	Specifies whether the offset commit operation is successful. This output returns a value of 'true' when the stored procedure successfully commits the consumer group offsets to Kafka, and a value of 'false' when the commit fails due to connection issues or broker errors.

CData Cloud

CreateTopic

Creates a new topic on the Kafka broker. This stored procedure allows you to define key topic parameters (for example, the number of partitions and replication factor) and initialize new message streams for publishing and consumption.

Input


Name	Type	Required	Description
Topic	String	True	Specifies the name of the Kafka topic to create. This input defines the identifier for the new message stream that producers can publish to and consumers can subscribe to after creation.
Partitions	String	True	Specifies the number of partitions to assign to the topic. This value determines how message data is distributed across brokers and affects parallel processing and scalability.
ReplicationFactor	String	True	Specifies the number of replicas that are maintained for each partition. The value cannot exceed the number of brokers in the Kafka cluster, and it determines the level of fault tolerance and data redundancy.

Result Set Columns


Name	Type	Description
Success	String	Returns a value of 'true' when the stored procedure successfully creates the Kafka topic and a value of 'false' when the operation fails because of configuration or broker-level errors.

CData Cloud

DeleteTopic

Deletes an existing topic from the Kafka broker. This stored procedure permanently removes the topic and all of its messages, so it should be used with caution in production environments.

Input


Name	Type	Required	Description
Topic	String	True	Specifies the name of the Kafka topic to delete. This input identifies the message stream that is permanently removed from the Kafka broker, including all associated partitions and stored data.

Result Set Columns


Name	Type	Description
Success	String	Returns a value of 'true' when the stored procedure successfully deletes the specified Kafka topic and a value of 'false' when the deletion fails because of permission restrictions or broker-level errors.

CData Cloud

GetAdminConsentURL.rsb

Specifies any additional parameters that are required by the authorization request. These parameters are passed directly to the OAuth authorization endpoint.

Input


Name	Type	Required	Description
CallbackUrl	String	False	Specifies the URL to which the user is redirected after authorizing the application. This value must match the Reply URL that is defined in the Azure Active Directory application settings.
State	String	False	Specifies the same state value that is sent when the application requests the authorization code. This input is used to maintain request integrity and to prevent cross-site request forgery (CSRF) attacks during the OAuth process.

Result Set Columns


Name	Type	Description
URL	String	Returns the administrator consent URL that must be entered in a web browser to obtain the verifier token and authorize the application for domain-level access.

CData Cloud

ProduceMessage

Publishes a raw message to a specified Kafka topic. This stored procedure enables data producers to send serialized message payloads (for example, JSON or Avro) directly into the message stream for downstream consumers to process.

Minimum Required Inputs

When producing a message, only the Topic input is required:

EXEC ProduceMessage Topic = 'mytopic'

The other properties have the following default behaviors:

Partition: The native library determines the partition automatically, taking into account any relevant options from ProducerProperties.
KeyText and KeyBytes: If neither of these is provided, the Cloud produces a message without a key.
MessageText and MessageBytes: If neither of these is provided, the Cloud produces a message without any content.

Omitting the key or message content may make the message unreadable with certain Cloud settings. For example, setting MessageKeyType to integer may cause read failures with messages that have no keys. TypeDetectionScheme MessageOnly and MessageKeyType Binary are recommended if you need to read messages containing arbitrary content or keys.

KeyText/KeyBytes and MessageText/MessageBytes

The key and message can each be provided either as text or base64-encoded strings. KeyBytes and MessageBytes accept base64-encoded strings. The Cloud decodes these values into bytes before sending the message to Kafka.

KeyText and MessageText accept any text. The Cloud encodes these values as UTF-8 before sending the message to Kafka.

For example, the following statement inserts a message with no key and content that is a single NUL byte:

EXEC ProduceMessage Topic = 'mytopic', MessageBytes = 'AA=='

Input


Name	Type	Required	Description
Topic	String	True	Specifies the Kafka topic that contains the message. This input identifies the message stream to which the data record is published.
Partition	String	False	Specifies the partition that the message is assigned to. The partition value must be valid for the specified topic. The native Kafka client automatically assigns a partition when this input is not set.
KeyText	String	False	Specifies the message key that is provided as readable text. The KeyText input defines the human-readable key portion of the Kafka message that determines how messages are partitioned. The value is encoded as UTF-8 before it is sent to Kafka. Do not set KeyText when KeyBytes input is provided.
KeyBytes	String	False	Specifies the message key that is provided as a Base64-encoded binary string. The KeyBytes input defines the binary representation of the same logical key as the KeyText parameter and is used when the key must preserve exact byte content (for example, hashed or serialized identifiers). Do not set KeyBytes when KeyText input is provided.
MessageText	String	False	Specifies the message value that is provided as readable text. The MessageText input defines the human-readable content of the Kafka message that is sent to the specified topic. The value is encoded as UTF-8 before it is sent to Kafka. Do not set MessageText when MessageBytes input is provided. The MessageBytes input represents the same message value in Base64-encoded binary format.
MessageBytes	String	False	Specifies the message value that is provided as a Base64-encoded binary string. The MessageBytes input defines the binary representation of the same message value as the MessageText input parameter and is used when the message content must preserve exact byte content (for example, serialized, compressed, or non-text data). Do not set MessageBytes when the MessageText input is provided.
Message	String	False	Indicates a deprecated input that functions the same as MessageBytes. This input is used only when MessageBytes is not provided. Both MessageBytes and Message input parameters must be omitted to send a message with no content.

Result Set Columns


Name	Type	Description
PartitionWritten	Int	Returns the partition that the message is written to. The value is identical to the Partition input when that parameter is specified.
OffsetWritten	Long	Returns the offset position within the partition where the message is written. This value enables message retrieval or replay from a specific point in the stream.
TimestampWritten	Long	Returns the UNIX timestamp that represents the exact instant when the message is committed to the partition.
KeyWritten	String	Returns the Base64-encoded data of the key that is written. This output is null when neither KeyText nor KeyBytes is provided.
MessageWritten	String	Returns the Base64-encoded data of the message value that is written. This output is null when neither MessageText nor MessageBytes is provided.
Success	Bool	Returns a value of 'true' when the stored procedure successfully writes the message to Kafka and a value of 'false' when the operation fails due to invalid parameters or broker-level errors.

CData Cloud

System Tables

You can query the system tables described in this section to access schema information, information on data source functionality, and batch operation statistics.

Schema Tables

The following tables return database metadata for Apache Kafka:

sys_catalogs: Lists the available databases.
sys_schemas: Lists the available schemas.
sys_tables: Lists the available tables and views.
sys_tablecolumns: Describes the columns of the available tables and views.
sys_procedures: Describes the available stored procedures.
sys_procedureparameters: Describes stored procedure parameters.
sys_keycolumns: Describes the primary and foreign keys.
sys_indexes: Describes the available indexes.

Data Source Tables

The following tables return information about how to connect to and query the data source:

sys_connection_props: Returns information on the available connection properties.
sys_sqlinfo: Describes the SELECT queries that the Cloud can offload to the data source.

Query Information Tables

The following table returns query statistics for data modification queries:

sys_identity: Returns information about batch operations or single updates.

CData Cloud

sys_catalogs

Lists the available databases.

The following query retrieves all databases determined by the connection string:

SELECT * FROM sys_catalogs

Columns


Name	Type	Description
CatalogName	String	The database name.

CData Cloud

sys_schemas

Lists the available schemas.

The following query retrieves all available schemas:

          SELECT * FROM sys_schemas

Columns


Name	Type	Description
CatalogName	String	The database name.
SchemaName	String	The schema name.

CData Cloud

sys_tables

Lists the available tables.

The following query retrieves the available tables and views:

          SELECT * FROM sys_tables

Columns


Name	Type	Description
CatalogName	String	The database containing the table or view.
SchemaName	String	The schema containing the table or view.
TableName	String	The name of the table or view.
TableType	String	The table type (table or view).
Description	String	A description of the table or view.
IsUpdateable	Boolean	Whether the table can be updated.

CData Cloud

sys_tablecolumns

Describes the columns of the available tables and views.

The following query returns the columns and data types for the SampleTable_1 table:

SELECT ColumnName, DataTypeName FROM sys_tablecolumns WHERE TableName='SampleTable_1'

Columns


Name	Type	Description
CatalogName	String	The name of the database containing the table or view.
SchemaName	String	The schema containing the table or view.
TableName	String	The name of the table or view containing the column.
ColumnName	String	The column name.
DataTypeName	String	The data type name.
DataType	Int32	An integer indicating the data type. This value is determined at run time based on the environment.
Length	Int32	The storage size of the column.
DisplaySize	Int32	The designated column's normal maximum width in characters.
NumericPrecision	Int32	The maximum number of digits in numeric data. The column length in characters for character and date-time data.
NumericScale	Int32	The column scale or number of digits to the right of the decimal point.
IsNullable	Boolean	Whether the column can contain null.
Description	String	A brief description of the column.
Ordinal	Int32	The sequence number of the column.
IsAutoIncrement	String	Whether the column value is assigned in fixed increments.
IsGeneratedColumn	String	Whether the column is generated.
IsHidden	Boolean	Whether the column is hidden.
IsArray	Boolean	Whether the column is an array.
IsReadOnly	Boolean	Whether the column is read-only.
IsKey	Boolean	Indicates whether a field returned from sys_tablecolumns is the primary key of the table.
ColumnType	String	The role or classification of the column in the schema. Possible values include SYSTEM, LINKEDCOLUMN, NAVIGATIONKEY, REFERENCECOLUMN, and NAVIGATIONPARENTCOLUMN.

CData Cloud

sys_procedures

Lists the available stored procedures.

The following query retrieves the available stored procedures:

          SELECT * FROM sys_procedures

Columns


Name	Type	Description
CatalogName	String	The database containing the stored procedure.
SchemaName	String	The schema containing the stored procedure.
ProcedureName	String	The name of the stored procedure.
Description	String	A description of the stored procedure.
ProcedureType	String	The type of the procedure, such as PROCEDURE or FUNCTION.

CData Cloud

sys_procedureparameters

Describes stored procedure parameters.

The following query returns information about all of the input parameters for the SampleProcedure stored procedure:

SELECT * FROM sys_procedureparameters WHERE ProcedureName = 'SampleProcedure' AND Direction = 1 OR Direction = 2

To include result set columns in addition to the parameters, set the IncludeResultColumns pseudo column to True:

SELECT * FROM sys_procedureparameters WHERE ProcedureName = 'SampleProcedure' AND IncludeResultColumns='True'

Columns


Name	Type	Description
CatalogName	String	The name of the database containing the stored procedure.
SchemaName	String	The name of the schema containing the stored procedure.
ProcedureName	String	The name of the stored procedure containing the parameter.
ColumnName	String	The name of the stored procedure parameter.
Direction	Int32	An integer corresponding to the type of the parameter: input (1), input/output (2), or output(4). input/output type parameters can be both input and output parameters.
DataType	Int32	An integer indicating the data type. This value is determined at run time based on the environment.
DataTypeName	String	The name of the data type.
NumericPrecision	Int32	The maximum precision for numeric data. The column length in characters for character and date-time data.
Length	Int32	The number of characters allowed for character data. The number of digits allowed for numeric data.
NumericScale	Int32	The number of digits to the right of the decimal point in numeric data.
IsNullable	Boolean	Whether the parameter can contain null.
IsRequired	Boolean	Whether the parameter is required for execution of the procedure.
IsArray	Boolean	Whether the parameter is an array.
Description	String	The description of the parameter.
Ordinal	Int32	The index of the parameter.
Values	String	The values you can set in this parameter are limited to those shown in this column. Possible values are comma-separated.
SupportsStreams	Boolean	Whether the parameter represents a file that you can pass as either a file path or a stream.
IsPath	Boolean	Whether the parameter is a target path for a schema creation operation.
Default	String	The value used for this parameter when no value is specified.
SpecificName	String	A label that, when multiple stored procedures have the same name, uniquely identifies each identically-named stored procedure. If there's only one procedure with a given name, its name is simply reflected here.
IsCDataProvided	Boolean	Whether the procedure is added/implemented by CData, as opposed to being a native Apache Kafka procedure.

Pseudo-Columns


Name	Type	Description
IncludeResultColumns	Boolean	Whether the output should include columns from the result set in addition to parameters. Defaults to False.

CData Cloud

sys_keycolumns

Describes the primary and foreign keys.

The following query retrieves the primary key for the SampleTable_1 table:

         SELECT * FROM sys_keycolumns WHERE IsKey='True' AND TableName='SampleTable_1'

Columns


Name	Type	Description
CatalogName	String	The name of the database containing the key.
SchemaName	String	The name of the schema containing the key.
TableName	String	The name of the table containing the key.
ColumnName	String	The name of the key column.
IsKey	Boolean	Whether the column is a primary key in the table referenced in the TableName field.
IsForeignKey	Boolean	Whether the column is a foreign key referenced in the TableName field.
PrimaryKeyName	String	The name of the primary key.
ForeignKeyName	String	The name of the foreign key.
ReferencedCatalogName	String	The database containing the primary key.
ReferencedSchemaName	String	The schema containing the primary key.
ReferencedTableName	String	The table containing the primary key.
ReferencedColumnName	String	The column name of the primary key.

CData Cloud

sys_foreignkeys

Describes the foreign keys.

The following query retrieves all foreign keys which refer to other tables:

         SELECT * FROM sys_foreignkeys WHERE ForeignKeyType = 'FOREIGNKEY_TYPE_IMPORT'

Columns


Name	Type	Description
CatalogName	String	The name of the database containing the key.
SchemaName	String	The name of the schema containing the key.
TableName	String	The name of the table containing the key.
ColumnName	String	The name of the key column.
PrimaryKeyName	String	The name of the primary key.
ForeignKeyName	String	The name of the foreign key.
ReferencedCatalogName	String	The database containing the primary key.
ReferencedSchemaName	String	The schema containing the primary key.
ReferencedTableName	String	The table containing the primary key.
ReferencedColumnName	String	The column name of the primary key.
ForeignKeyType	String	Designates whether the foreign key is an import (points to other tables) or export (referenced from other tables) key.

CData Cloud

sys_primarykeys

Describes the primary keys.

The following query retrieves the primary keys from all tables and views:

         SELECT * FROM sys_primarykeys

Columns


Name	Type	Description
CatalogName	String	The name of the database containing the key.
SchemaName	String	The name of the schema containing the key.
TableName	String	The name of the table containing the key.
ColumnName	String	The name of the key column.
KeySeq	String	The sequence number of the primary key.
KeyName	String	The name of the primary key.

CData Cloud

sys_indexes

Describes the available indexes. By filtering on indexes, you can write more selective queries with faster query response times.

The following query retrieves all indexes that are not primary keys:

          SELECT * FROM sys_indexes WHERE IsPrimary='false'

Columns


Name	Type	Description
CatalogName	String	The name of the database containing the index.
SchemaName	String	The name of the schema containing the index.
TableName	String	The name of the table containing the index.
IndexName	String	The index name.
ColumnName	String	The name of the column associated with the index.
IsUnique	Boolean	True if the index is unique. False otherwise.
IsPrimary	Boolean	True if the index is a primary key. False otherwise.
Type	Int16	An integer value corresponding to the index type: statistic (0), clustered (1), hashed (2), or other (3).
SortOrder	String	The sort order: A for ascending or D for descending.
OrdinalPosition	Int16	The sequence number of the column in the index.

CData Cloud

sys_connection_props

Returns information on the available connection properties and those set in the connection string.

The following query retrieves all connection properties that have been set in the connection string or set through a default value:

SELECT * FROM sys_connection_props WHERE Value <> ''

Columns


Name	Type	Description
Name	String	The name of the connection property.
ShortDescription	String	A brief description.
Type	String	The data type of the connection property.
Default	String	The default value if one is not explicitly set.
Values	String	A comma-separated list of possible values. A validation error is thrown if another value is specified.
Value	String	The value you set or a preconfigured default.
Required	Boolean	Whether the property is required to connect.
Category	String	The category of the connection property.
IsSessionProperty	String	Whether the property is a session property, used to save information about the current connection.
Sensitivity	String	The sensitivity level of the property. This informs whether the property is obfuscated in logging and authentication forms.
PropertyName	String	A camel-cased truncated form of the connection property name.
Ordinal	Int32	The index of the parameter.
CatOrdinal	Int32	The index of the parameter category.
Hierarchy	String	Shows dependent properties associated that need to be set alongside this one.
Visible	Boolean	Informs whether the property is visible in the connection UI.
ETC	String	Various miscellaneous information about the property.

CData Cloud

sys_sqlinfo

Describes the SELECT query processing that the Cloud can offload to the data source.

See SQL Compliance for SQL syntax details.

Discovering the Data Source's SELECT Capabilities

Below is an example data set of SQL capabilities. Some aspects of SELECT functionality are returned in a comma-separated list if supported; otherwise, the column contains NO.


Name	Description	Possible Values
AGGREGATE_FUNCTIONS	Supported aggregation functions.	AVG, COUNT, MAX, MIN, SUM, DISTINCT
COUNT	Whether COUNT function is supported.	YES, NO
IDENTIFIER_QUOTE_OPEN_CHAR	The opening character used to escape an identifier.	[
IDENTIFIER_QUOTE_CLOSE_CHAR	The closing character used to escape an identifier.	]
SUPPORTED_OPERATORS	A list of supported SQL operators.	=, >, <, >=, <=, <>, !=, LIKE, NOT LIKE, IN, NOT IN, IS NULL, IS NOT NULL, AND, OR
GROUP_BY	Whether GROUP BY is supported, and, if so, the degree of support.	NO, NO_RELATION, EQUALS_SELECT, SQL_GB_COLLATE
OJ_CAPABILITIES	The supported varieties of outer joins supported.	NO, LEFT, RIGHT, FULL, INNER, NOT_ORDERED, ALL_COMPARISON_OPS
OUTER_JOINS	Whether outer joins are supported.	YES, NO
SUBQUERIES	Whether subqueries are supported, and, if so, the degree of support.	NO, COMPARISON, EXISTS, IN, CORRELATED_SUBQUERIES, QUANTIFIED
STRING_FUNCTIONS	Supported string functions.	LENGTH, CHAR, LOCATE, REPLACE, SUBSTRING, RTRIM, LTRIM, RIGHT, LEFT, UCASE, SPACE, SOUNDEX, LCASE, CONCAT, ASCII, REPEAT, OCTET, BIT, POSITION, INSERT, TRIM, UPPER, REGEXP, LOWER, DIFFERENCE, CHARACTER, SUBSTR, STR, REVERSE, PLAN, UUIDTOSTR, TRANSLATE, TRAILING, TO, STUFF, STRTOUUID, STRING, SPLIT, SORTKEY, SIMILAR, REPLICATE, PATINDEX, LPAD, LEN, LEADING, KEY, INSTR, INSERTSTR, HTML, GRAPHICAL, CONVERT, COLLATION, CHARINDEX, BYTE
NUMERIC_FUNCTIONS	Supported numeric functions.	ABS, ACOS, ASIN, ATAN, ATAN2, CEILING, COS, COT, EXP, FLOOR, LOG, MOD, SIGN, SIN, SQRT, TAN, PI, RAND, DEGREES, LOG10, POWER, RADIANS, ROUND, TRUNCATE
TIMEDATE_FUNCTIONS	Supported date/time functions.	NOW, CURDATE, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, MONTH, QUARTER, WEEK, YEAR, CURTIME, HOUR, MINUTE, SECOND, TIMESTAMPADD, TIMESTAMPDIFF, DAYNAME, MONTHNAME, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, EXTRACT
REPLICATION_SKIP_TABLES	Indicates tables skipped during replication.
REPLICATION_TIMECHECK_COLUMNS	A string array containing a list of columns which will be used to check for (in the given order) to use as a modified column during replication.
IDENTIFIER_PATTERN	String value indicating what string is valid for an identifier.
SUPPORT_TRANSACTION	Indicates if the provider supports transactions such as commit and rollback.	YES, NO
DIALECT	Indicates the SQL dialect to use.
KEY_PROPERTIES	Indicates the properties which identify the uniform database.
SUPPORTS_MULTIPLE_SCHEMAS	Indicates if multiple schemas may exist for the provider.	YES, NO
SUPPORTS_MULTIPLE_CATALOGS	Indicates if multiple catalogs may exist for the provider.	YES, NO
DATASYNCVERSION	The CData Data Sync version needed to access this driver.	Standard, Starter, Professional, Enterprise
DATASYNCCATEGORY	The CData Data Sync category of this driver.	Source, Destination, Cloud Destination
SUPPORTSENHANCEDSQL	Whether enhanced SQL functionality beyond what is offered by the API is supported.	TRUE, FALSE
SUPPORTS_BATCH_OPERATIONS	Whether batch operations are supported.	YES, NO
SQL_CAP	All supported SQL capabilities for this driver.	SELECT, INSERT, DELETE, UPDATE, TRANSACTIONS, ORDERBY, OAUTH, ASSIGNEDID, LIMIT, LIKE, BULKINSERT, COUNT, BULKDELETE, BULKUPDATE, GROUPBY, HAVING, AGGS, OFFSET, REPLICATE, COUNTDISTINCT, JOINS, DROP, CREATE, DISTINCT, INNERJOINS, SUBQUERIES, ALTER, MULTIPLESCHEMAS, GROUPBYNORELATION, OUTERJOINS, UNIONALL, UNION, UPSERT, GETDELETED, CROSSJOINS, GROUPBYCOLLATE, MULTIPLECATS, FULLOUTERJOIN, MERGE, JSONEXTRACT, BULKUPSERT, SUM, SUBQUERIESFULL, MIN, MAX, JOINSFULL, XMLEXTRACT, AVG, MULTISTATEMENTS, FOREIGNKEYS, CASE, LEFTJOINS, COMMAJOINS, WITH, LITERALS, RENAME, NESTEDTABLES, EXECUTE, BATCH, BASIC, INDEX
PREFERRED_CACHE_OPTIONS	A string value specifies the preferred cacheOptions.
ENABLE_EF_ADVANCED_QUERY	Indicates if the driver directly supports advanced queries coming from Entity Framework. If not, queries will be handled client side.	YES, NO
PSEUDO_COLUMNS	A string array indicating the available pseudo columns.
MERGE_ALWAYS	If the value is true, The Merge Mode is forcibly executed in Data Sync.	TRUE, FALSE
REPLICATION_MIN_DATE_QUERY	A select query to return the replicate start datetime.
REPLICATION_MIN_FUNCTION	Allows a provider to specify the formula name to use for executing a server side min.
REPLICATION_START_DATE	Allows a provider to specify a replicate startdate.
REPLICATION_MAX_DATE_QUERY	A select query to return the replicate end datetime.
REPLICATION_MAX_FUNCTION	Allows a provider to specify the formula name to use for executing a server side max.
IGNORE_INTERVALS_ON_INITIAL_REPLICATE	A list of tables which will skip dividing the replicate into chunks on the initial replicate.
CHECKCACHE_USE_PARENTID	Indicates whether the CheckCache statement should be done against the parent key column.	TRUE, FALSE
CREATE_SCHEMA_PROCEDURES	Indicates stored procedures that can be used for generating schema files.

The following query retrieves the operators that can be used in the WHERE clause:

SELECT * FROM sys_sqlinfo WHERE Name = 'SUPPORTED_OPERATORS'

Note that individual tables may have different limitations or requirements on the WHERE clause; refer to the Data Model section for more information.

Columns


Name	Type	Description
NAME	String	A component of SQL syntax, or a capability that can be processed on the server.
VALUE	String	Detail on the supported SQL or SQL syntax.

CData Cloud

sys_identity

Returns information about attempted modifications.

The following query retrieves the Ids of the modified rows in a batch operation:

         SELECT * FROM sys_identity

Columns


Name	Type	Description
Id	String	The database-generated Id returned from a data modification operation.
Batch	String	An identifier for the batch. 1 for a single operation.
Operation	String	The result of the operation in the batch: INSERTED, UPDATED, or DELETED.
Message	String	SUCCESS or an error message if the update in the batch failed.

CData Cloud

sys_information

Describes the available system information.

The following query retrieves all columns:

SELECT * FROM sys_information

Columns


Name	Type	Description
Product	String	The name of the product.
Version	String	The version number of the product.
Datasource	String	The name of the datasource the product connects to.
NodeId	String	The unique identifier of the machine where the product is installed.
HelpURL	String	The URL to the product's help documentation.
License	String	The license information for the product. (If this information is not available, the field may be left blank or marked as 'N/A'.)
Location	String	The file path location where the product's library is stored.
Environment	String	The version of the environment or rumtine the product is currently running under.
DataSyncVersion	String	The tier of CData Sync required to use this connector.
DataSyncCategory	String	The category of CData Sync functionality (e.g., Source, Destination).

CData Cloud

Connection String Options

The connection string properties are the various options that can be used to establish a connection. This section provides a complete list of the options you can configure in the connection string for this provider. Click the links for further details.

For more information on establishing a connection, see Establishing a Connection.

Authentication

Property	Description
AuthScheme	Specifies the authentication scheme that the provider uses when connecting to the Apache Kafka broker.
User	Specifies the username used to authenticate to the Apache Kafka broker.
Password	Specifies the password used to authenticate to Apache Kafka for the selected authentication scheme.
BootstrapServers	Specifies the Kafka bootstrap servers that the provider uses to establish the initial connection to the Kafka cluster.
UseSSL	Specifies whether the provider negotiates SSL/TLS when connecting to the Apache Kafka broker.

Connection

Property	Description
ConsumerGroupId	Specifies the consumer group that the provider uses when reading messages from Apache Kafka.
AutoCommit	Specifies whether the Apache Kafka consumer automatically commits read offsets.

Azure Authentication

Property	Description
AzureTenant	Identifies the Apache Kafka tenant being used to access data. Accepts either the tenant's domain name (for example, contoso.onmicrosoft.com ) or its directory (tenant) ID.
AzureResource	The Azure Active resource to authenticate to (used during Azure OAuth exchange).

OAuth

Property	Description
OAuthClientId	Specifies the client ID (also known as the consumer key) assigned to your custom OAuth application. This ID is required to identify the application to the OAuth authorization server during authentication.
OAuthClientSecret	Specifies the client secret assigned to your custom OAuth application. This confidential value is used to authenticate the application to the OAuth authorization server. (Custom OAuth applications only.).
DelegatedServiceAccounts	Specifies a space-delimited list of service account emails for delegated requests.
RequestingServiceAccount	Specifies a service account email to make a delegated request.
Scope	Specifies the scopes to request when obtaining an OAuth token from the OIDC token endpoint.
OAuthAccessTokenURL	The URL from which the OAuth access token is retrieved.

JWT OAuth

Property	Description
OAuthJWTCert	Supplies the name of the client certificate's JWT Certificate store.
OAuthJWTCertType	Identifies the type of key store containing the JWT Certificate.
OAuthJWTCertPassword	Provides the password for the OAuth JWT certificate used to access a password-protected certificate store. If the certificate store does not require a password, leave this property blank.
OAuthJWTCertSubject	Identifies the subject of the OAuth JWT certificate used to locate a matching certificate in the store. Supports partial matches and the wildcard '*' to select the first certificate.

Kerberos

Property	Description
KerberosKeytabFile	Specifies the path to the keytab file that contains the Kerberos principals and encrypted keys used for authentication.
KerberosSPN	Specifies the full Kerberos service principal name (SPN) of the Kafka broker.
KerberosServiceName	Specifies the Kerberos service name used when authenticating to the Kafka broker.
UseKerberosTicketCache	Specifies whether the provider uses the Kerberos ticket cache for authentication instead of a keytab file.

SSL

Property	Description
SSLServerCert	Specifies the SSL server certificate or certificate store used to verify the identity of the Apache Kafka broker.
SSLServerCertType	Specifies the format of the SSL server certificate used to verify the Apache Kafka broker.
SSLClientCert	Specifies the SSL client certificate used to authenticate with the Apache Kafka broker.
SSLClientCertType	Specifies the format of the SSL client certificate used to connect to the Apache Kafka broker.
SSLClientCertPassword	Specifies the password used to decrypt the certificate provided in SSLClientCert .
SSLIdentificationAlgorithm	Specifies the endpoint identification algorithm used to validate the server host name during SSL/TLS connections.

Schema Registry

Property	Description
RegistryURL	Specifies the endpoint of the schema registry. When this property is specified, the driver supports reading Avro and JSON schemas from the server.
RegistryService	Specifies the schema registry service that the provider uses to retrieve key and value schemas for Apache Kafka topics.
RegistryAuthScheme	Specifies the authentication scheme that the provider uses when connecting to the schema registry.
RegistryUser	Specifies the user name used when authenticating to the schema registry with the Basic authentication scheme.
RegistryPassword	Specifies the password used when authenticating to the schema registry with the Basic authentication scheme.
RegistryVersion	Specifies which version of a schema the provider retrieves from the schema registry when resolving topic columns.
RegistryServerCert	The certificate to be accepted from the schema registry when connecting using TLS/SSL.
SchemaMergeMode	Specifies how the provider exposes schemas with multiple versions.

Logging

Property	Description
Verbosity	Specifies the verbosity level of the log file, which controls the amount of detail logged. Supported values range from 1 to 5.

Schema

Property	Description
BrowsableSchemas	Optional setting that restricts the schemas reported to a subset of all available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC .

Miscellaneous

Property	Description
AllowKeyOnlyRegistryTopics	Specifies whether the provider exposes key-only Schema Registry topics as tables.
AWSWorkloadIdentityConfig	Configuration properties to provide when using Workload Identity Federation via AWS.
AzureWorkloadIdentityConfig	Configuration properties to provide when using Workload Identity Federation via Azure.
CompressionType	Specifies the compression algorithm that the provider uses when producing messages to Apache Kafka.
ConsumerProperties	Specifies additional Kafka consumer configuration options that the provider passes directly to the underlying Kafka client.
CreateTablePartitions	Specifies the number of partitions to assign to a topic created through a CREATE TABLE statement.
CreateTableReplicationFactor	Specifies the the number of replicas to assign to a topic created through a CREATE TABLE statement.
EnableIdempotence	Specifies whether the provider ensures that produced messages are delivered in order and without duplicates.
ExposeQueueMetadataColumns	Specifies whether the Partition, Offset, and Timestamp columns are exposed.
FlattenArrays	Specifies how many elements to return from nested arrays when TypeDetectionScheme is set to SchemaRegistry.
HideUnusedColumns	Determines whether to hide key or value colums when the topic has no associated schema information.
MaximumBatchSize	Specifies the maximum size, in bytes, of a batch of messages that the provider gathers before sending the batch to Apache Kafka.
MaxRows	Specifies the maximum number of rows returned for queries that do not include either aggregation or GROUP BY.
MessageKeyColumn	Specifies the name of the column where the provider stores the message key for each record.
MessageKeyType	The type of data stored in message keys.
NonRegistryTypeDetectionScheme	Specifies the TypeDetectionScheme to use for topics that do not have schemas in the schema registry.
OffsetResetStrategy	Specifies how the provider determines the starting offset when no committed offset exists for the consumer group.
Pagesize	Specifies the maximum number of rows that the provider retrieves from Apache Kafka in a single read operation.
ProducerProperties	Specifies additional Apache Kafka producer configuration options that the provider passes directly to the client.
PseudoColumns	Specifies the pseudocolumns to expose as table columns, expressed as a string in the format 'TableName=ColumnName;TableName=ColumnName'.
ReadDuration	Specifies how long, in seconds, the provider waits for additional messages after a read operation begins.
RowScanDepth	Specifies the maximum number of messages that the provider scans to determine the columns and data types for a topic.
SASLOAuthExtensions	Specifies the extension values to send with OAuth auth schemes.
SchemaRegistryOnly	Specifies whether the provider connects only to the schema registry.
SerializationFormat	Specifies how to serialize/deserialize message contents.
ThrowsKeyNotFound	Specifies whether or not throws an exception if there is no rows updated.
Timeout	Specifies the maximum time, in seconds, that the provider waits for a server response before throwing a timeout error.
TypeDetectionScheme	Specifies how the provider determines the available fields and data types for each topic.
UseConfluentAvroFormat	Specifies how Avro data should be formatted during an INSERT.
ValidateRegistryTopics	Specifies whether or not to validate schema registry topics against the Apache Kafka broker. Only has an effect when TypeDetectionScheme =SchemaRegistry.
WorkloadPoolId	The ID of your Workload Identity Federation pool.
WorkloadProjectId	The ID of the Google Cloud project that hosts your Workload Identity Federation pool.
WorkloadProviderId	The ID of your Workload Identity Federation pool provider.

CData Cloud

Authentication

This section provides a complete list of the Authentication properties you can configure in the connection string for this provider.

Property	Description
AuthScheme	Specifies the authentication scheme that the provider uses when connecting to the Apache Kafka broker.
User	Specifies the username used to authenticate to the Apache Kafka broker.
Password	Specifies the password used to authenticate to Apache Kafka for the selected authentication scheme.
BootstrapServers	Specifies the Kafka bootstrap servers that the provider uses to establish the initial connection to the Kafka cluster.
UseSSL	Specifies whether the provider negotiates SSL/TLS when connecting to the Apache Kafka broker.

CData Cloud

AuthScheme

Specifies the authentication scheme that the provider uses when connecting to the Apache Kafka broker.

Possible Values

None, Plain, SCRAM, SCRAM-SHA-512, Kerberos, SSLCertificate, KafkaOAuthClient, AzureAD, AzureServicePrincipal, AzureServicePrincipalCert, OAuthJWT, AWSWorkloadIdentity, AzureWorkloadIdentity

Data Type

string

Default Value

"None"

Remarks

Authentication requirements vary by Kafka deployment. The Cloud supports several authentication schemes used by self-managed Kafka clusters, cloud-managed Kafka services, and Schema Registry environments. The schemes listed below determine how the Cloud obtains credentials and establishes a secure connection to the broker.

Supported schemes for Apache Kafka:


Scheme	Description
None	Connects without authentication. No credentials are required.
Plain	Authenticates using a plain text login module. Requires User and Password.
SCRAM	Authenticates using a SCRAM login module with SHA-256 hashing.
SCRAM-SHA-512	Authenticates using a SCRAM login module with SHA-512 hashing.
Kerberos	Use Kerberos authentication. Requires a Kerberos configuration file and relevant Kerberos properties.
SSLCertificate	Authenticates using an SSL client certificate.
KafkaOAuthClient	Authenticates using Kafka's native OpenID Connect (KIP-768) authentication with the client credentials grant type. Note: This does not use the Cloud's built-in support for OAuth like the Azure and OAuthJWT authentication methods do. Only the OAuthClientId, OAuthClientSecret, OAuthAccessTokenURL, and Scope properties have any effect on the behavior of this scheme. Use ConsumerProperties and ProducerProperties to provide advanced configuration.
AzureAD	Authenticates using Azure Active Directory OAuth.
AzureMSI	Obtains Managed Service Identity credentials automatically when running on an Azure VM.
AzureServicePrincipal	Authenticates as an Azure Service Principal using a client secret.
AzureServicePrincipalCert	Authenticates as an Azure Service Principal using a certificate.
OAuthJWT	Authenticates using an OAuth service account with JWT-based credential flow.
GCPInstanceAccount	Authenticates using an access token obtained from a Google Cloud instance.
AWSWorkloadIdentity	Authenticates using AWS Workload Identity Federation.

Schemes for authenticating to Azure Event Hubs:


AzureAD	Authenticates using Azure Active Directory OAuth.
AzureMSI	Obtains Managed Service Identity credentials automatically when running on an Azure VM.
AzureServicePrincipal	Authenticates as an Azure Service Principal using a client secret.
AzureServicePrincipalCert	Authenticates as an Azure Service Principal using a certificate.

Schemes for authenticating to GMS Kafka:


OAuthJWT	Authenticates using an OAuth service account.
GCPInstanceAccount	Authenticates using a Google Cloud instance account.
AWSWorkloadIdentity	Authenticates using AWS Workload Identity Federation. GMS Kafka does not allow external principals to authenticate directly, so you must delegate authentication to a service account using the RequestingServiceAccount property.

CData Cloud

User

Specifies the username used to authenticate to the Apache Kafka broker.

Data Type

string

Default Value

Remarks

When using an authentication scheme that requires credentials, this property provides the username associated with the connection. If this property is left blank, the Cloud attempts an unauthenticated connection, which will fail unless the broker allows anonymous access.

CData Cloud

Password

Specifies the password used to authenticate to Apache Kafka for the selected authentication scheme.

Data Type

string

Default Value

Remarks

This property is required when using the Plain, SCRAM, or SCRAM-SHA-512 authentication schemes. If this property is not set when one of these schemes is selected, the connection will fail during authentication.

CData Cloud

BootstrapServers

Specifies the Kafka bootstrap servers that the provider uses to establish the initial connection to the Kafka cluster.

Data Type

string

Default Value

Remarks

Specify each server using a hostname or IP address followed by a port number. For example: 10.1.2.3:9092. You may provide multiple comma-separated addresses. The connection succeeds as long as the Cloud can reach at least one of the listed servers.

Kafka bootstrap servers are not responsible for all messaging traffic. They are used only to retrieve metadata, after which the Cloud connects to the appropriate brokers in the cluster.

If you are connecting to Confluent Cloud, the bootstrap server address is available in the Cluster settings.

CData Cloud

UseSSL

Specifies whether the provider negotiates SSL/TLS when connecting to the Apache Kafka broker.

Data Type

bool

Default Value

false

Remarks

This option is enabled automatically when AuthScheme is set to SSL.

CData Cloud

Connection

This section provides a complete list of the Connection properties you can configure in the connection string for this provider.

Property	Description
ConsumerGroupId	Specifies the consumer group that the provider uses when reading messages from Apache Kafka.
AutoCommit	Specifies whether the Apache Kafka consumer automatically commits read offsets.

CData Cloud

ConsumerGroupId

Specifies the consumer group that the provider uses when reading messages from Apache Kafka.

Data Type

string

Default Value

Remarks

A consumer group is a logical identifier that Kafka uses to track committed offsets and coordinate message consumption across multiple consumers. All consumers that share the same group ID use the same committed offsets and divide message processing among themselves.

If this property is not specified, the Cloud generates a unique random group ID for each connection. In this mode, offsets are not shared across connections, and each connection reads messages independently of others. Set this property when you want multiple connections or applications to share the same committed offsets, or when you are using features that depend on offset tracking, such as AutoCommit.

CData Cloud

AutoCommit

Specifies whether the Apache Kafka consumer automatically commits read offsets.

Data Type

bool

Default Value

false

Remarks

By default, the Cloud does not commit read offsets unless you invoke CommitOffset. When an offset is committed for a topic, the Cloud starts reading from that offset in future queries until the next call to CommitOffset. If no offsets are committed, each query begins reading from the position determined by OffsetResetStrategy.

When set to true, the Cloud commits offsets periodically and also at the end of each SELECT query. This causes each SELECT query to consume the messages it reads, preventing future queries in the same consumer group from returning those messages. A ConsumerGroupId is required because the committed offset is shared across the consumer group, allowing later connections with the same group ID to use the same offsets.

Consider the following when enabling AutoCommit:

Queries with OFFSET: Some queries may consume more messages than they return. For example, a SELECT query with an OFFSET clause consumes messages up to the specified offset and then returns rows after that point. Messages consumed to satisfy the OFFSET are discarded and cannot be read again within the same consumer group.
TypeDetectionScheme Behavior: The None and RowScan options in TypeDetectionScheme perform internal reads on the topic to identify available columns. Because AutoCommit relies on periodic commits performed by the underlying Apache Kafka client, offsets may be committed during these internal reads. To avoid this, use SchemaRegistry or MessageOnly in TypeDetectionScheme.
Message-Based Offsets: AutoCommit operates on messages, not rows. Some SerializationFormat options can read multiple rows from a single Kafka message, such as when a Kafka message contains a JSON array. If a SELECT query reads only a subset of rows from that message, Kafka treats the entire message as consumed, meaning any unread rows within that message are discarded.

CData Cloud

Azure Authentication

This section provides a complete list of the Azure Authentication properties you can configure in the connection string for this provider.

Property	Description
AzureTenant	Identifies the Apache Kafka tenant being used to access data. Accepts either the tenant's domain name (for example, contoso.onmicrosoft.com ) or its directory (tenant) ID.
AzureResource	The Azure Active resource to authenticate to (used during Azure OAuth exchange).

CData Cloud

AzureTenant

Identifies the Apache Kafka tenant being used to access data. Accepts either the tenant's domain name (for example, contoso.onmicrosoft.com ) or its directory (tenant) ID.

Data Type

string

Default Value

Remarks

A tenant is a digital container for your organization's users and resources, managed through Microsoft Entra ID (formerly Azure AD). Each tenant is associated with a unique directory ID, and often with a custom domain (for example, microsoft.com or contoso.onmicrosoft.com).

To find the directory (tenant) ID in the Microsoft Entra Admin Center, navigate to Microsoft Entra ID > Properties and copy the value labeled "Directory (tenant) ID".

This property is required in the following cases:

When AuthScheme is set to AzureServicePrincipal or AzureServicePrincipalCert
When AuthScheme is AzureAD and the user account belongs to multiple tenants

You can provide the tenant value in one of two formats:

A domain name (for example, contoso.onmicrosoft.com)
A directory (tenant) ID in GUID format (for example, c9d7b8e4-1234-4f90-bc1a-2a28e0f9e9e0)

Specifying the tenant explicitly ensures that the authentication request is routed to the correct directory, which is especially important when a user belongs to multiple tenants or when using service principal–based authentication.

If this value is omitted when required, authentication may fail or connect to the wrong tenant. This can result in errors such as unauthorized or resource not found.

To find the directory (tenant) ID in the Microsoft Entra Admin Center, navigate to Microsoft Entra ID > Properties and copy the value labeled "Directory (tenant) ID".

This property is required in the following cases:

When AuthScheme is set to AzureServicePrincipal or AzureServicePrincipalCert
When AuthScheme is AzureAD and the user account belongs to multiple tenants

You can provide the tenant value in one of two formats:

A domain name (for example, contoso.onmicrosoft.com)
A directory (tenant) ID in GUID format (for example, c9d7b8e4-1234-4f90-bc1a-2a28e0f9e9e0)

If this value is omitted when required, authentication may fail or connect to the wrong tenant. This can result in errors such as unauthorized or resource not found.

CData Cloud

AzureResource

The Azure Active resource to authenticate to (used during Azure OAuth exchange).

Data Type

string

Default Value

Remarks

The resource must be specified if using Azure OAuth. It should be set to the App Id URI of the web API (secured resource).

CData Cloud

OAuth

This section provides a complete list of the OAuth properties you can configure in the connection string for this provider.

Property	Description
OAuthClientId	Specifies the client ID (also known as the consumer key) assigned to your custom OAuth application. This ID is required to identify the application to the OAuth authorization server during authentication.
OAuthClientSecret	Specifies the client secret assigned to your custom OAuth application. This confidential value is used to authenticate the application to the OAuth authorization server. (Custom OAuth applications only.).
DelegatedServiceAccounts	Specifies a space-delimited list of service account emails for delegated requests.
RequestingServiceAccount	Specifies a service account email to make a delegated request.
Scope	Specifies the scopes to request when obtaining an OAuth token from the OIDC token endpoint.
OAuthAccessTokenURL	The URL from which the OAuth access token is retrieved.

CData Cloud

OAuthClientId

Specifies the client ID (also known as the consumer key) assigned to your custom OAuth application. This ID is required to identify the application to the OAuth authorization server during authentication.

Data Type

string

Default Value

Remarks

This property is required in two cases:

When using a custom OAuth application, such as in web-based authentication flows, service-based authentication, or certificate-based flows that require application registration.
If the driver does not provide embedded OAuth credentials.

(When the driver provides embedded OAuth credentials, this value may already be provided by the Cloud and thus not require manual entry.)

OAuthClientId is generally used alongside other OAuth-related properties such as OAuthClientSecret and OAuthSettingsLocation when configuring an authenticated connection.

OAuthClientId is one of the key connection parameters that need to be set before users can authenticate via OAuth. You can usually find this value in your identity provider’s application registration settings. Look for a field labeled Client ID, Application ID, or Consumer Key.

While the client ID is not considered a confidential value like a client secret, it is still part of your application's identity and should be handled carefully. Avoid exposing it in public repositories or shared configuration files.

For more information on how this property is used when configuring a connection, see Establishing a Connection.

CData Cloud

OAuthClientSecret

Specifies the client secret assigned to your custom OAuth application. This confidential value is used to authenticate the application to the OAuth authorization server. (Custom OAuth applications only.).

Data Type

string

Default Value

Remarks

This property (sometimes called the application secret or consumer secret) is required when using a custom OAuth application in any flow that requires secure client authentication, such as web-based OAuth, service-based connections, or certificate-based authorization flows. It is not required when using an embedded OAuth application.

The client secret is used during the token exchange step of the OAuth flow, when the driver requests an access token from the authorization server. If this value is missing or incorrect, authentication fails with either an invalid_client or an unauthorized_client error.

OAuthClientSecret is one of the key connection parameters that need to be set before users can authenticate via OAuth. You can obtain this value from your identity provider when registering the OAuth application.

Notes:

This value should be stored securely and never exposed in public repositories, scripts, or unsecured environments.
Client secrets may also expire after a set period. Be sure to monitor expiration dates and rotate secrets as needed to maintain uninterrupted access.

For more information on how this property is used when configuring a connection, see Establishing a Connection

CData Cloud

DelegatedServiceAccounts

Specifies a space-delimited list of service account emails for delegated requests.

Data Type

string

Default Value

Remarks

The service account emails must be specified in a space-delimited list.

Each service account must be granted the roles/iam.serviceAccountTokenCreator role on its next service account in the chain.

The last service account in the chain must be granted the roles/iam.serviceAccountTokenCreator role on the requesting service account. The requesting service account is the one specified in the RequestingServiceAccount property.

Note that for delegated requests, the requesting service account must have the permission iam.serviceAccounts.getAccessToken, which can also be granted through the serviceAccountTokenCreator role.

CData Cloud

RequestingServiceAccount

Specifies a service account email to make a delegated request.

Data Type

string

Default Value

Remarks

The service account email of the account for which the credentials are requested in a delegated request. With the list of delegated service accounts in DelegatedServiceAccounts, this property is used to make a delegated request.

You must have the IAM permission iam.serviceAccounts.getAccessToken on this service account.

CData Cloud

Scope

Specifies the scopes to request when obtaining an OAuth token from the OIDC token endpoint.

Data Type

string

Default Value

Remarks

Scopes are set to define what kind of access the authenticating user will have; for example, read, read and write, restricted access to sensitive information. System administrators can use scopes to selectively enable access by functionality or security clearance.

When InitiateOAuth is set to GETANDREFRESH, you must use this property if you want to change which scopes are requested.

When InitiateOAuth is set to either REFRESH or OFF, you can change which scopes are requested using either this property or the Scope input.

When set to a value, this property is included in the OAuth authorization request sent to your identity provider. Scopes determine the level of access granted in the issued token.

The default server-side Apache Kafka OAuth validator does not require scopes, but your identity provider may enforce them. Set this property to whatever scope string your provider requires for client credential OIDC authentication.

CData Cloud

OAuthAccessTokenURL

The URL from which the OAuth access token is retrieved.

Data Type

string

Default Value

Remarks

In OAuth 1.0, the authorized request token is exchanged for the access token at this URL.

CData Cloud

JWT OAuth

This section provides a complete list of the JWT OAuth properties you can configure in the connection string for this provider.

Property	Description
OAuthJWTCert	Supplies the name of the client certificate's JWT Certificate store.
OAuthJWTCertType	Identifies the type of key store containing the JWT Certificate.
OAuthJWTCertPassword	Provides the password for the OAuth JWT certificate used to access a password-protected certificate store. If the certificate store does not require a password, leave this property blank.
OAuthJWTCertSubject	Identifies the subject of the OAuth JWT certificate used to locate a matching certificate in the store. Supports partial matches and the wildcard '*' to select the first certificate.

CData Cloud

OAuthJWTCert

Supplies the name of the client certificate's JWT Certificate store.

Data Type

string

Default Value

Remarks

The OAuthJWTCertType field specifies the type of the certificate store specified in OAuthJWTCert. If the store is password-protected, use OAuthJWTCertPassword to supply the password..

OAuthJWTCert is used in conjunction with the OAuthJWTCertSubject field in order to specify client certificates. If OAuthJWTCert has a value, and OAuthJWTCertSubject is set, the CData Cloud initiates a search for a certificate. For further information, see OAuthJWTCertSubject.

Designations of certificate stores are platform-dependent.

Notes

The most common User and Machine certificate stores in Windows include:
- MY: A certificate store holding personal certificates with their associated private keys.
- CA: Certifying authority certificates.
- ROOT: Root certificates.
- SPC: Software publisher certificates.
In Java, the certificate store normally is a file containing certificates and optional private keys.
When the certificate store type is PFXFile, this property must be set to the name of the file.
When the type is PFXBlob, the property must be set to the binary contents of a PFX file (i.e. PKCS12 certificate store).

CData Cloud

OAuthJWTCertType

Identifies the type of key store containing the JWT Certificate.

Possible Values

PFXBLOB, JKSBLOB, PEMKEY_BLOB, PUBLIC_KEY_BLOB, SSHPUBLIC_KEY_BLOB, XMLBLOB, BCFKSBLOB, GOOGLEJSONBLOB

Data Type

string

Default Value

"PEMKEY_BLOB"

Remarks


Value	Description	Notes
USER	A certificate store owned by the current user.	Only available in Windows.
MACHINE	A machine store.	Not available in Java or other non-Windows environments.
PFXFILE	A PFX (PKCS12) file containing certificates.
PFXBLOB	A string (base-64-encoded) representing a certificate store in PFX (PKCS12) format.
JKSFILE	A Java key store (JKS) file containing certificates.	Only available in Java.
JKSBLOB	A string (base-64-encoded) representing a certificate store in Java key store (JKS) format.	Only available in Java.
PEMKEY_FILE	A PEM-encoded file that contains a private key and an optional certificate.
PEMKEY_BLOB	A string (base64-encoded) that contains a private key and an optional certificate.
PUBLIC_KEY_FILE	A file that contains a PEM- or DER-encoded public key certificate.
PUBLIC_KEY_BLOB	A string (base-64-encoded) that contains a PEM- or DER-encoded public key certificate.
SSHPUBLIC_KEY_FILE	A file that contains an SSH-style public key.
SSHPUBLIC_KEY_BLOB	A string (base-64-encoded) that contains an SSH-style public key.
P7BFILE	A PKCS7 file containing certificates.
PPKFILE	A file that contains a PPK (PuTTY Private Key).
XMLFILE	A file that contains a certificate in XML format.
XMLBLOB	Astring that contains a certificate in XML format.
BCFKSFILE	A file that contains an Bouncy Castle keystore.
BCFKSBLOB	A string (base-64-encoded) that contains a Bouncy Castle keystore.
GOOGLEJSON	A JSON file containing the service account information.	Only valid when connecting to a Google service.
GOOGLEJSONBLOB	A string that contains the service account JSON.	Only valid when connecting to a Google service.

CData Cloud

OAuthJWTCertPassword

Provides the password for the OAuth JWT certificate used to access a password-protected certificate store. If the certificate store does not require a password, leave this property blank.

Data Type

string

Default Value

Remarks

This property specifies the password needed to open a password-protected certificate store. To determine if a password is necessary, refer to the documentation or configuration for your specific certificate store.

This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys are not encrypted.

CData Cloud

OAuthJWTCertSubject

Identifies the subject of the OAuth JWT certificate used to locate a matching certificate in the store. Supports partial matches and the wildcard '*' to select the first certificate.

Data Type

string

Default Value

"*"

Remarks

The value of this property is used to locate a matching certificate in the store. The search process works as follows:

If an exact match for the subject is found, the corresponding certificate is selected.
If no exact match is found, the store is searched for certificates whose subjects contain the property value.
If no match is found, no certificate is selected.

You can set the value to '*' to automatically select the first certificate in the store. The certificate subject is a comma-separated list of distinguished name fields and values. For example: CN=www.server.com, OU=test, C=US, [email protected].

Common fields include:


Field	Meaning
CN	Common Name. This is commonly a host name like www.server.com.
O	Organization
OU	Organizational Unit
L	Locality
S	State
C	Country
E	Email Address

If a field value contains a comma, enclose it in quotes. For example: "O=ACME, Inc.".

CData Cloud

Kerberos

This section provides a complete list of the Kerberos properties you can configure in the connection string for this provider.

Property	Description
KerberosKeytabFile	Specifies the path to the keytab file that contains the Kerberos principals and encrypted keys used for authentication.
KerberosSPN	Specifies the full Kerberos service principal name (SPN) of the Kafka broker.
KerberosServiceName	Specifies the Kerberos service name used when authenticating to the Kafka broker.
UseKerberosTicketCache	Specifies whether the provider uses the Kerberos ticket cache for authentication instead of a keytab file.

CData Cloud

KerberosKeytabFile

Specifies the path to the keytab file that contains the Kerberos principals and encrypted keys used for authentication.

Data Type

string

Default Value

Remarks

Set this property to the location of a valid keytab file when using Kerberos authentication. The keytab must contain the principal entries required for the Kerberos login configured for the Cloud.

CData Cloud

KerberosSPN

Specifies the full Kerberos service principal name (SPN) of the Kafka broker.

Data Type

string

Default Value

Remarks

Set this property to the service principal name that identifies the Kafka broker in Kerberos. The SPN typically includes the service name and hostname, such as kafka/host.example.com@REALM. The Cloud uses this value when requesting a service ticket during Kerberos authentication.

CData Cloud

KerberosServiceName

Specifies the Kerberos service name used when authenticating to the Kafka broker.

Data Type

string

Default Value

Remarks

Set this property to the service name portion of the Kerberos principal used by the Kafka broker.

CData Cloud

UseKerberosTicketCache

Specifies whether the provider uses the Kerberos ticket cache for authentication instead of a keytab file.

Data Type

bool

Default Value

false

Remarks

When set to true, the Cloud uses the Kerberos tickets already obtained by the logged-in user. In this mode, KerberosKeytabFile is not required.

When set to false, the Cloud uses the keytab file specified in KerberosKeytabFile to obtain Kerberos credentials.

This option is useful when users authenticate interactively and already have valid Kerberos tickets on the system.

CData Cloud

SSL

This section provides a complete list of the SSL properties you can configure in the connection string for this provider.

Property	Description
SSLServerCert	Specifies the SSL server certificate or certificate store used to verify the identity of the Apache Kafka broker.
SSLServerCertType	Specifies the format of the SSL server certificate used to verify the Apache Kafka broker.
SSLClientCert	Specifies the SSL client certificate used to authenticate with the Apache Kafka broker.
SSLClientCertType	Specifies the format of the SSL client certificate used to connect to the Apache Kafka broker.
SSLClientCertPassword	Specifies the password used to decrypt the certificate provided in SSLClientCert .
SSLIdentificationAlgorithm	Specifies the endpoint identification algorithm used to validate the server host name during SSL/TLS connections.

CData Cloud

SSLServerCert

Specifies the SSL server certificate or certificate store used to verify the identity of the Apache Kafka broker.

Data Type

string

Default Value

Remarks

The value for this property must match the format described by SSLServerCertType. This determines whether the certificate is supplied as a file path, a base64-encoded blob, or a reference to a system certificate store.

The server certificate is used to confirm that the broker’s SSL certificate is signed by a trusted Certificate Authority (CA). If the provided certificate does not match the server’s certificate chain, the SSL connection will be rejected.

CData Cloud

SSLServerCertType

Specifies the format of the SSL server certificate used to verify the Apache Kafka broker.

Possible Values

PEMKEY_BLOB

Data Type

string

Default Value

"PEMKEY_BLOB"

Remarks

This property is used to determine what format the SSLServerCert property expects. This property can take one of the following values:


Store Type	Description
PEMKEY_FILE	The certificate store is the name of a PEM-encoded file that contains a the server certificate.
PEMKEY_BLOB	The certificate store is a string that contains the server certificate.

CData Cloud

SSLClientCert

Specifies the SSL client certificate used to authenticate with the Apache Kafka broker.

Data Type

string

Default Value

Remarks

This property provides the certificate or certificate store that the Cloud uses for SSL client authentication. The expected format depends on the value of SSLClientCertType, which determines whether the certificate is supplied as a file path, a base64-encoded string, or a reference to a system certificate store.

If the certificate store is password protected, set the password in SSLClientCertPassword.

See SSLClientCertType for details about supported certificate formats and how to supply them.

CData Cloud

SSLClientCertType

Specifies the format of the SSL client certificate used to connect to the Apache Kafka broker.

Possible Values

JKSFILE, PFXFILE, PEMKEY_FILE, PEMKEY_BLOB

Data Type

string

Default Value

"PEMKEY_FILE"

Remarks

This property is used to determine what format the SSLClientCert property expects. This property can take one of the following values:


Store Type	Description
PEMKEY_FILE	The certificate store is the name of a PEM-encoded file that contains a private key and certificate.
PEMKEY_BLOB	The certificate store is a string that contains a private key and certificate, optionally encoded in base64.

CData Cloud

SSLClientCertPassword

Specifies the password used to decrypt the certificate provided in SSLClientCert .

Data Type

string

Default Value

Remarks

Leave this property blank if the client certificate or keystore is not password protected.

CData Cloud

SSLIdentificationAlgorithm

Specifies the endpoint identification algorithm used to validate the server host name during SSL/TLS connections.

Data Type

string

Default Value

Remarks

By default, this property uses https, which enables host name verification. If you set this property to a blank value, host name verification is disabled.

Host name verification ensures that the certificate presented by the server matches the expected server host name. Disabling this verification can be useful in development or testing environments, but is not recommended for production.

CData Cloud

Schema Registry

This section provides a complete list of the Schema Registry properties you can configure in the connection string for this provider.

Property	Description
RegistryURL	Specifies the endpoint of the schema registry. When this property is specified, the driver supports reading Avro and JSON schemas from the server.
RegistryService	Specifies the schema registry service that the provider uses to retrieve key and value schemas for Apache Kafka topics.
RegistryAuthScheme	Specifies the authentication scheme that the provider uses when connecting to the schema registry.
RegistryUser	Specifies the user name used when authenticating to the schema registry with the Basic authentication scheme.
RegistryPassword	Specifies the password used when authenticating to the schema registry with the Basic authentication scheme.
RegistryVersion	Specifies which version of a schema the provider retrieves from the schema registry when resolving topic columns.
RegistryServerCert	The certificate to be accepted from the schema registry when connecting using TLS/SSL.
SchemaMergeMode	Specifies how the provider exposes schemas with multiple versions.

CData Cloud

RegistryURL

Specifies the endpoint of the schema registry. When this property is specified, the driver supports reading Avro and JSON schemas from the server.

Data Type

string

Default Value

Remarks

Set this property to the URL or identifier of the schema registry service you are connecting to.

If you are connecting to Confluent Cloud, this corresponds to the Schema Registry endpoint value in Schemas > Schema Registry > Instructions.
If you are connecting to AWS Glue Schema Registry, set this property to the Amazon Resource Name (ARN) of the registry you want to use.

CData Cloud

RegistryService

Specifies the schema registry service that the provider uses to retrieve key and value schemas for Apache Kafka topics.

Possible Values

Confluent, AWSGlue

Data Type

string

Default Value

"Confluent"

Remarks

This property determines which registry implementation the Cloud connects to when using a registry-based TypeDetectionScheme. The supported services are:


Value	Description
Confluent	Use Confluent Schema Registry. Supports both key and value schemas and client certificate authentication.
AWSGlue	Use AWS Glue Schema Registry. Supports schema registration and discovery through AWS Glue.

CData Cloud

RegistryAuthScheme

Specifies the authentication scheme that the provider uses when connecting to the schema registry.

Possible Values

None, Basic, SSLCertificate

Data Type

string

Default Value

"None"

Remarks

The following authentication schemes are supported. Some schemes are available only for specific registry services, depending on the capabilities of the configured RegistryService.


Value	Description
None	No authentication is used when connecting to the schema registry.
Basic	The Cloud uses RegistryUser and RegistryPassword. For Confluent Schema Registry, these values correspond to the API key and secret. For AWS Glue Schema Registry, these are the IAM access key and secret key.
SSLCertificate	The Cloud authenticates using RegistryClientCert using SSL client authentication. This option is supported only when connecting to a Confluent registry.

CData Cloud

RegistryUser

Specifies the user name used when authenticating to the schema registry with the Basic authentication scheme.

Data Type

string

Default Value

Remarks

Set this property when RegistryAuthScheme is set to Basic, as Basic authentication requires both a user name and a password.

The meaning of this value depends on the registry service:

For Confluent Schema Registry, this value corresponds to the Access Key shown under Schemas > Schema Registry > API access.
For AWS Glue Schema Registry, this value corresponds to the IAM access key ID associated with your AWS credentials.

This property works together with RegistryPassword, which supplies the corresponding secret or key.

CData Cloud

RegistryPassword

Specifies the password used when authenticating to the schema registry with the Basic authentication scheme.

Data Type

string

Default Value

Remarks

Set this property when RegistryAuthScheme is set to Basic. For Confluent Schema Registry, this value corresponds to the Secret Key shown under Schemas > Schema Registry > API access.

CData Cloud

RegistryVersion

Specifies which version of a schema the provider retrieves from the schema registry when resolving topic columns.

Data Type

string

Default Value

"latest"

Remarks

Set this property to latest to use the most recent version of each schema, or set it to a specific version number. When a version number is provided, that version must exist for every schema subject used by the topics being read.

If SchemaMergeMode is set to Simple, this property is ignored. In that mode, the Cloud merges all available schema versions for each topic to produce a combined set of columns.

CData Cloud

RegistryServerCert

The certificate to be accepted from the schema registry when connecting using TLS/SSL.

Data Type

string

Default Value

Remarks

If you are using a TLS/SSL connection, use this property to specify the TLS/SSL certificate to be accepted from the server. If you specify a value for this property, all other certificates that are not trusted by the machine are rejected.

This property can take the following forms:


Description	Example
A full PEM Certificate (example shortened for brevity)	`-----BEGIN CERTIFICATE----- MIIChTCCAe4CAQAwDQYJKoZIhv......Qw== -----END CERTIFICATE-----`
A path to a local file containing the certificate	C:\cert.cer
The public key (example shortened for brevity)	`-----BEGIN RSA PUBLIC KEY----- MIGfMA0GCSq......AQAB -----END RSA PUBLIC KEY-----`
The MD5 Thumbprint (hex values can also be either space- or colon-separated)	`ecadbdda5a1529c58a1e9e09828d70e4`
The SHA1 Thumbprint (hex values can also be either space- or colon-separated)	`34a929226ae0819f2ec14b4a3d904f801cbb150d`

Note: It is possible to use '*' to signify that all certificates should be accepted, but due to security concerns this is not recommended.

CData Cloud

SchemaMergeMode

Specifies how the provider exposes schemas with multiple versions.

Possible Values

None, Simple

Data Type

string

Default Value

"None"

Remarks

By default the Cloud sets SchemaMergeMode to None.

None

This mode only supports one version for schemas in the registry. It is normally the latest version, but you can change RegistryVersion to use a specific version number. The Cloud ignores the content of any message that does not match the schema for its topic. Reading the topic returns the message, but all of its data fields (fields other than Partition, Offset, and Timestamp) are reported as NULL.

Limitations

This mode supports both SELECT and INSERT queries into each topic. An INSERT always uses the version of the schema specified by RegistryVersion.

This mode supports all options for RegistryService.

Schema Confusion

For compatibility with previous versions, the Cloud does not enforce the schema ID included on messages when RegistryService is set to Confluent. With SchemaMergeMode set to None this ID is always ignored, but even with SchemaMergeMode set to Simple the Cloud ignores the ID if it cannot find a matching schema. This may cause the Cloud to output field values under unexpected columns.

For example, consider the following two Avro schemas that store names and address details. The schemas are binary compatible: even though the field names differ, they have the same number of fields with the same types in the same order.

{
  "type": "record",
  "name": "personname",
  "fields": [
    { "name": "PersonID", "type": "int" },
    { "name": "LastName", "type": "string" },
    { "name": "FirstName", "type": "string" }
  ],
}
{
  "type": "record",
  "name": "personaddress",
  "fields": [
    { "name": "PersonID", "type": "int" },
    { "name": "Address", "type": "string" },
    { "name": "City", "type": "string" }
  ],
}

If you produce these messages to the topic using the personname schema, the Cloud may parse these messages using the personaddress schema. This happens if, for example, personname and personaddress are two versions of the same registry schema. The Cloud sees that personaddress is the latest version and uses it for this topic.

{"PersonID": 1, "LastName": "Smithers", "FirstName": "William"}
{"PersonID": 2, "LastName": "McAllister", "FirstName": "Amy"}

In that scenario, the Cloud outputs these results:


PersonID	Address	City
1	Smithers	William
2	McAllister	Amy

Simple

Setting SchemaMergeMode to Simple causes the Cloud to load all versions of each topic schema and merge them according to the following rules. These rules ensure that the Cloud produces NULL or a valid value for each column. If any rule fails, the Cloud fails the schema merge by logging an error and outputting a schema with no data fields.

Limitations

This mode supports only SELECT queries. The Cloud does not have a way to specify a specific version of a schema to use for INSERT queries. If you need to produce messages in this mode, use the ProduceMessage stored procedure.

This mode only supports RegistryService when set to Confluent. Messages produced with the Confluent libraries include the ID of the schema their data conforms to. The Cloud uses this to determine what schema to parse each message with.

If a message does not have an ID, or if the ID refers to a schema that does not match the topic name, the Cloud defaults to the latest schema. This may cause field values to appear in unexpected columns if the schemas are different but produce compatible output. See the Schema Confusion section above for a more detailed discussion of this issue.

Schema Validation Rules

If all versions of the schema are valid according to these rules, the Cloud includes every field from every version of the schema in the table.

Each field must have the same type across all versions where they appear. Fields may appear in some versions and not others. Those fields are reported as NULL when they are not present.
All versions must be Avro schemas.

During validation, the type of a field is the type that the Cloud uses to report the field, not its original Avro type. This means that two versions of a schema can have a field which in one version is an aggregate (array, map, ...) and another is a string. For example, the Cloud considers these two schemas compatible, but there is currently no way to tell whether the address field is JSON or just text.

{
    "type": "record"   
    "name": "person",
    "fields": [
        { "name": "address", "type": {"type": "array", "items": "string"}}
    ]
}

{
    "type": "record"   
    "name": "person",
    "fields": [
        { "name": "address", "type": "string" }
    ]
}

The rules apply transitively across all versions. This means that two versions of the schema may be valid in isolation, but not when considering all versions of the schema. For example, consider a schema where v1 contains an integer amount field, v2 removes it, and v3 adds a decimal amount field. v1 and v2 are valid together because removing fields is allowed, and v2 and v3 are valid together because adding fields is allowed. However, all three versions combined violate the rules because the amount field changed type between v1 and v3.

For best results, enable one of the transitive schema compatibility modes within the schema registry. The Cloud does not check the compatibility mode as part of its validation rules. However, setting a transitive schema compatibility mode prevents you from creating schemas that the Cloud cannot process.

CData Cloud

Logging

This section provides a complete list of the Logging properties you can configure in the connection string for this provider.

Property	Description
Verbosity	Specifies the verbosity level of the log file, which controls the amount of detail logged. Supported values range from 1 to 5.

CData Cloud

Verbosity

Specifies the verbosity level of the log file, which controls the amount of detail logged. Supported values range from 1 to 5.

Data Type

string

Default Value

"1"

Remarks

This property defines the level of detail the Cloud includes in the log file. Higher verbosity levels increase the detail of the logged information, but may also result in larger log files and slower performance due to the additional data being captured.

The default verbosity level is 1, which is recommended for regular operation. Higher verbosity levels are primarily intended for debugging purposes. For more information on each level, refer to Logging.

When combined with the LogModules property, Verbosity can refine logging to specific categories of information.

CData Cloud

Schema

This section provides a complete list of the Schema properties you can configure in the connection string for this provider.

Property	Description
BrowsableSchemas	Optional setting that restricts the schemas reported to a subset of all available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC .

CData Cloud

BrowsableSchemas

Optional setting that restricts the schemas reported to a subset of all available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC .

Data Type

string

Default Value

Remarks

Listing all available database schemas can take extra time, thus degrading performance. Providing a list of schemas in the connection string saves time and improves performance.

CData Cloud

Miscellaneous

This section provides a complete list of the Miscellaneous properties you can configure in the connection string for this provider.

Property	Description
AllowKeyOnlyRegistryTopics	Specifies whether the provider exposes key-only Schema Registry topics as tables.
AWSWorkloadIdentityConfig	Configuration properties to provide when using Workload Identity Federation via AWS.
AzureWorkloadIdentityConfig	Configuration properties to provide when using Workload Identity Federation via Azure.
CompressionType	Specifies the compression algorithm that the provider uses when producing messages to Apache Kafka.
ConsumerProperties	Specifies additional Kafka consumer configuration options that the provider passes directly to the underlying Kafka client.
CreateTablePartitions	Specifies the number of partitions to assign to a topic created through a CREATE TABLE statement.
CreateTableReplicationFactor	Specifies the the number of replicas to assign to a topic created through a CREATE TABLE statement.
EnableIdempotence	Specifies whether the provider ensures that produced messages are delivered in order and without duplicates.
ExposeQueueMetadataColumns	Specifies whether the Partition, Offset, and Timestamp columns are exposed.
FlattenArrays	Specifies how many elements to return from nested arrays when TypeDetectionScheme is set to SchemaRegistry.
HideUnusedColumns	Determines whether to hide key or value colums when the topic has no associated schema information.
MaximumBatchSize	Specifies the maximum size, in bytes, of a batch of messages that the provider gathers before sending the batch to Apache Kafka.
MaxRows	Specifies the maximum number of rows returned for queries that do not include either aggregation or GROUP BY.
MessageKeyColumn	Specifies the name of the column where the provider stores the message key for each record.
MessageKeyType	The type of data stored in message keys.
NonRegistryTypeDetectionScheme	Specifies the TypeDetectionScheme to use for topics that do not have schemas in the schema registry.
OffsetResetStrategy	Specifies how the provider determines the starting offset when no committed offset exists for the consumer group.
Pagesize	Specifies the maximum number of rows that the provider retrieves from Apache Kafka in a single read operation.
ProducerProperties	Specifies additional Apache Kafka producer configuration options that the provider passes directly to the client.
PseudoColumns	Specifies the pseudocolumns to expose as table columns, expressed as a string in the format 'TableName=ColumnName;TableName=ColumnName'.
ReadDuration	Specifies how long, in seconds, the provider waits for additional messages after a read operation begins.
RowScanDepth	Specifies the maximum number of messages that the provider scans to determine the columns and data types for a topic.
SASLOAuthExtensions	Specifies the extension values to send with OAuth auth schemes.
SchemaRegistryOnly	Specifies whether the provider connects only to the schema registry.
SerializationFormat	Specifies how to serialize/deserialize message contents.
ThrowsKeyNotFound	Specifies whether or not throws an exception if there is no rows updated.
Timeout	Specifies the maximum time, in seconds, that the provider waits for a server response before throwing a timeout error.
TypeDetectionScheme	Specifies how the provider determines the available fields and data types for each topic.
UseConfluentAvroFormat	Specifies how Avro data should be formatted during an INSERT.
ValidateRegistryTopics	Specifies whether or not to validate schema registry topics against the Apache Kafka broker. Only has an effect when TypeDetectionScheme =SchemaRegistry.
WorkloadPoolId	The ID of your Workload Identity Federation pool.
WorkloadProjectId	The ID of the Google Cloud project that hosts your Workload Identity Federation pool.
WorkloadProviderId	The ID of your Workload Identity Federation pool provider.

CData Cloud

AllowKeyOnlyRegistryTopics

Specifies whether the provider exposes key-only Schema Registry topics as tables.

Data Type

bool

Default Value

false

Remarks

This property applies only when TypeDetectionScheme is set to SchemaRegistry.

The Cloud supports two types of schemas in the Schema Registry: value schemas and key schemas. By default, the Cloud requires a value schema for a topic to be exposed as a table. Topics that contain only a key schema are not exposed unless this property is enabled. Key schemas are optional and are used only when MessageKeyColumn is set.

When set to true, the Cloud exposes topics that contain either a key schema, a value schema, or both. For topics without a value schema, the message value is returned as a base64 blob because no deserialization can be performed. This behavior is similar to scenarios where SerializationFormat is set to NONE or TypeDetectionScheme is set to MessageOnly.

When set to false, topics without a value schema are not exposed. This is the default behavior and ensures that only topics with complete value schemas are represented as tables.

This behavior is independent of the MessageKeyColumn setting. Even when this property is enabled, you must set MessageKeyColumn if you want the Cloud to return message key data.

Enable this property when you need to read topics defined only with key schemas or when working with registries where not all topics include value schemas. Enabling this property may increase the number of exposed tables and may require additional processing when handling base64-encoded message values.

CData Cloud

AWSWorkloadIdentityConfig

Configuration properties to provide when using Workload Identity Federation via AWS.

Data Type

string

Default Value

Remarks

The properties are formatted as a semicolon-separated list of Key=Value properties, where the value is optionally quoted. For example, this setting authenticates in AWS using a user's root keys:

AWSWorkloadIdentityConfig="AuthScheme=AwsRootKeys;AccessKey='AKIAABCDEF123456';SecretKey=...;Region=us-east-1"

CData Cloud

AzureWorkloadIdentityConfig

Configuration properties to provide when using Workload Identity Federation via Azure.

Data Type

string

Default Value

Remarks

The properties are formatted as a semicolon-separated list of Key=Value properties, where the value is optionally quoted. For example, this setting authenticates in Azure using client credentials:

AzureWorkloadIdentityConfig="AuthScheme=AzureServicePrincipal;AzureTenant=directory (tenant) id;OAuthClientID=application (client) id;OAuthClientSecret=client secret;AzureResource=application id uri;"

CData Cloud

CompressionType

Specifies the compression algorithm that the provider uses when producing messages to Apache Kafka.

Possible Values

none, gzip, snappy, lz4

Data Type

string

Default Value

"none"

Remarks

Select a compression type to reduce the size of produced messages. Compression is applied to batches of messages rather than individual messages.

The following compression types are supported:


Type	Description
NONE	Messages are sent without compression.
GZIP	Messages are compressed using gzip.
SNAPPY	Messages are compressed using Snappy.
LZ4	Messages are compressed using LZ4.

CData Cloud

ConsumerProperties

Specifies additional Kafka consumer configuration options that the provider passes directly to the underlying Kafka client.

Data Type

string

Default Value

Remarks

The Cloud exposes several Kafka consumer settings as connection properties and maps them internally to the corresponding Kafka client configuration keys. If the Cloud does not provide a dedicated property for a particular consumer option, you can set it through ConsumerProperties.

This property accepts a semicolon-separated list of Kafka consumer configuration pairs. These options are passed directly to the Kafka client. For example, security.protocol=SASL_SSL;sasl.mechanism=SCRAM-SHA-512 sets the security.protocol and sasl.mechanism Kafka consumer properties.

Be careful what configuration options you set via this property. The Cloud does not consider ConsumerProperties a sensitive property and its value appears in logs. Use this property to configure advanced Kafka consumer settings that are not directly exposed as connection properties, such as fine-tuning fetch behavior, retry intervals, timeouts, and security configuration details.

CData Cloud

CreateTablePartitions

Specifies the number of partitions to assign to a topic created through a CREATE TABLE statement.

Data Type

int

Default Value

Remarks

When you execute a CREATE TABLE statement, the Cloud creates a new empty Kafka topic. By default, this topic is created with one partition.

Increase this value to create topics with additional partitions. More partitions allow Kafka to distribute messages across multiple consumers within the same consumer group, enabling greater parallelism and higher throughput. Choose a partition count that aligns with your expected workload and consumer group size.

CData Cloud

CreateTableReplicationFactor

Specifies the the number of replicas to assign to a topic created through a CREATE TABLE statement.

Data Type

int

Default Value

Remarks

When you execute a CREATE TABLE statement, the Cloud creates a new empty Kafka topic. By default, the topic is created with a replication factor of 3.

You can adjust this value to create topics with more or fewer replicas. The replication factor must not exceed the number of brokers in your Kafka cluster. For example, a topic with a replication factor of 3 cannot be created on a cluster with only 2 brokers.

Kafka uses replicas to maintain availability and prevent data loss during broker failures. If all replicas for a partition become unavailable, the topic cannot be accessed. Increasing the number of replicas can improve resilience on larger clusters.

CData Cloud

EnableIdempotence

Specifies whether the provider ensures that produced messages are delivered in order and without duplicates.

Data Type

bool

Default Value

false

Remarks

When set to true, the Cloud enables Kafka's idempotent producer mode. Kafka assigns a sequence number to each produced message, allowing the broker to detect and discard duplicates and to preserve message order during retries.

When set to false, the Cloud does not apply these delivery guarantees, and duplicate messages may be produced if retries occur during network interruptions or broker failures.

CData Cloud

ExposeQueueMetadataColumns

Specifies whether the Partition, Offset, and Timestamp columns are exposed.

Data Type

bool

Default Value

true

Remarks

Apache Kafka messages include three pieces of metadata along with every message. These are the timestamp when the message was produced, what partition it was produced to, and the message's offset within that partition. The Cloud exposes these as the Timestamp, Partition, and Offset columns.

When set to true, these metadata columns are included in every table. When set to false, the Cloud hides these columns, but still receives the underlying metadata.

Consider the following points when deciding whether to disable this option:

There is no performance benefit to hiding these columns. The Apache Kafka protocol sends them every time the Cloud consumes a message.
The Cloud uses the Partition column to restrict SELECT statements to specific partitions. For example, to execute SELECT * FROM topic WHERE Partition IN (1, 2) the Cloud requests messages from only the first two partitions. Without the partition column the Cloud always consumes messages from all partitions, or whatever partitions are dictated by the current consumer group (if ConsumerGroupId is set).

CData Cloud

FlattenArrays

Specifies how many elements to return from nested arrays when TypeDetectionScheme is set to SchemaRegistry.

Data Type

string

Default Value

"0"

Remarks

When TypeDetectionScheme is set to SchemaRegistry, nested arrays are not exposed by default.

Set this property to expose a specific number of elements from a nested array as individual columns. The Cloud creates one column per element, starting with index 0.

For example, consider the following array:

["FLOW-MATIC","LISP","COBOL"]

If this property is set to 1, the Cloud exposes only the element at index 0:


Column Name	Column Value
languages.0	FLOW-MATIC

If this property is set to 0, no elements of nested arrays are exposed as columns.

CData Cloud

HideUnusedColumns

Determines whether to hide key or value colums when the topic has no associated schema information.

Possible Values

Legacy, Never, Key, Value, KeyAndValue

Data Type

string

Default Value

"Legacy"

Remarks

In some situations the Cloud has no information on the type of data stored in the keys or values of a topic's messages. This can happen for several reasons:

When TypeDetectionScheme=RowScan or TypeDetectionScheme=None, and none of the messages scanned have key/value content
When TypeDetectionScheme=SchemaRegistry or TypeDetectionScheme=SchemaRegistryAggregate
- If message keys are enabled (see MessageKeyType), and no key schema is available for a topic
- If AllowKeyOnlyRegistryTopics=true, and no value schema is available for a topic
- If either the key or value schema exisxts, but is in an unsupported format

This option determines whether the Cloud exposes fallback columns for parts of the message that have no schema information. In the following descriptions, value columns include both the Message column used for single values as well as columns flattened from the value. Similarly, key columns include both the top-level key column (named by MessageKeyColumn) and columns flattened from the key. Refer to Extracting Metadata From Topics for more information on column flattening.

By default the Cloud follows these legacy rules, which are compatible with versions of the Cloud before this option was introduced:

For TypeDetectionScheme=RowScan and TypeDetectionScheme=None, no value columns are exposed if none of the scanned messages have values. The same applies for keys if they are enabled.
For TypeDetectionScheme=SchemaRegistry or TypeDetectionScheme=SchemaRegistryAggregate, the behavior depends upon AllowKeyOnlyRegistryTopics. By default key-only topics are disabled and topics without value schemas are not exposed as tables. When key-only topics are enabled, topics without value schemas expose a fallback Message column that reports base64 values. In either case, topics without key schemas do not expose key columns.

The other options provide consistent behavior regardless of the TypeDetectionScheme:

With HideUnusedColumns=Never, the Cloud pushes fallback columns for topic values, and keys if enabled. The fallback value column is called Message and the fallback key columns is named by MessageKeyColumn. Both fallback columns report base64 values.
With HideUnusedColumns=Key, the Cloud does not push a fallback column for keys. It still pushes a fallback column for values. If key columns are not enabled, this option has the same behavior as Never.
With HideUnusedColumns=Value, the Cloud does not push a fallback column for values. It still pushes a fallback column for keys if keys are enabled.
With HideUnusedColumns=KeyAndMessage, the Cloud never pushes fallback columns.

Note that there are no cases where TypeDetectionScheme=MessageOnly and schema information is missing. In MessageOnly mode, the Cloud does not use any external information to determine the format of the topic's values. The Cloud always returns a single Message column that is decoded according to SerializationFormat. Key columns behave the same way but using MessageKeyType.

For the same reason, the Cloud does not consider a column unused if it uses a primitive value format. Setting SerializationFormat to Long, Integer, Float, Double, or String prevents the Cloud from accessing the broker or schema registry to determine value schemas. The same applies for MessageKeyType and key schemas.

CData Cloud

MaximumBatchSize

Specifies the maximum size, in bytes, of a batch of messages that the provider gathers before sending the batch to Apache Kafka.

Data Type

string

Default Value

"16384"

Remarks

A batch may contain one or more messages. The Cloud accumulates messages for a single partition until the total size of the batch reaches this value, at which point the batch is sent to Apache Kafka.

If a single message exceeds the batch size, the message is sent on its own. Apache Kafka uses batching to increase throughput, so larger batch sizes may reduce the number of requests sent to the broker, while smaller batch sizes may send messages more quickly, but with additional request overhead.

CData Cloud

MaxRows

Specifies the maximum number of rows returned for queries that do not include either aggregation or GROUP BY.

Data Type

int

Default Value

-1

Remarks

The default value for this property, -1, means that no row limit is enforced unless the query explicitly includes a LIMIT clause. (When a query includes a LIMIT clause, the value specified in the query takes precedence over the MaxRows setting.)

Setting MaxRows to a whole number greater than 0 ensures that queries do not return excessively large result sets by default.

This property is useful for optimizing performance and preventing excessive resource consumption when executing queries that could otherwise return very large datasets.

CData Cloud

MessageKeyColumn

Specifies the name of the column where the provider stores the message key for each record.

Data Type

string

Default Value

Remarks

Set this property to expose message key data as a column in the table schema. If this property is not set, message key data is not included in the result set. See MessageKeyType for details on how the Cloud interprets and formats message keys.

CData Cloud

MessageKeyType

The type of data stored in message keys.

Possible Values

Null, Binary, String, Long, Integer, Float, Double, Avro, CSV, CSV_WITH_HEADERS, XML, JSON, Auto

Data Type

string

Default Value

"Null"

Remarks

By default the Cloud does not report message keys. To enable message keys, this property must be set to a value other than Null and MessageKeyColumn must be set to a valid column name.

See Extracting Metadata From Topics for a description of how this interacts with the TypeDetectionScheme property. SerializationFormat describes how each of these supported formats is encoded. There are three main differences between how that property and this property work:

Complex key columns are always prefixed with MessageKeyColumn and a dot, while primitive key columns use MessageKeyColumn as their name. For example, if MessageKeyColumn is Key, an Avro key would expose columns like Key.field1. A string key would be expose a single column called Key.
SerializationFormat uses NONE for binary fields while this property uses Binary.
The Cloud only supports reading key schemas from the registry when connected to a Confluent RegistryService. Confluent registries use a naming convention that allows for both key and value schemas that cover the same topic.

CData Cloud

NonRegistryTypeDetectionScheme

Specifies the TypeDetectionScheme to use for topics that do not have schemas in the schema registry.

Possible Values

Disabled, None, RowScan, MessageOnly

Data Type

string

Default Value

"Disabled"

Remarks

When TypeDetectionScheme is set to a registry-type detection scheme, the Cloud exposes only topics that have schemas registered in the schema registry. Topics without registry entries are skipped and do not appear as tables.

Set this property to a value other than Disabled to expose both registry and non-registry topics. Topics with registry schemas continue to use the registry for column definitions, and topics without registry schemas use the detection scheme specified in this property.

The available values for this property are: Disabled, None, RowScan, and MessageOnly. These values have the same meaning as the corresponding options in TypeDetectionScheme.

See Extracting Metadata From Topics for details on how type detection works.

CData Cloud

OffsetResetStrategy

Specifies how the provider determines the starting offset when no committed offset exists for the consumer group.

Possible Values

Earliest, Latest

Data Type

string

Default Value

"Earliest"

Remarks

This property applies when the consumer group has no previously committed offset for a partition.

Select one of the following strategies:


Value	Description
Earliest	Start reading from the beginning of the partition.
Latest	Start reading from the end of the partition, consuming only messages produced after the consumer group begins reading.

Once a committed offset exists for a consumer group, Apache Kafka resumes reading from the committed offset regardless of this property's value.

CData Cloud

Pagesize

Specifies the maximum number of rows that the provider retrieves from Apache Kafka in a single read operation.

Data Type

int

Default Value

1000

Remarks

The Cloud reads messages from Apache Kafka in batches and stores them in memory before returning rows to the result set. Only the first row of each batch requires a network request while subsequent rows are returned directly from the in-memory buffer.

This property controls the maximum number of rows the Cloud holds in this buffer at one time. Higher values reduce how often the Cloud must request new data, while lower values reduce memory usage.

CData Cloud

ProducerProperties

Specifies additional Apache Kafka producer configuration options that the provider passes directly to the client.

Data Type

string

Default Value

Remarks

Use this property to supply Apache Kafka producer settings that are not exposed as dedicated connection properties. This property accepts a semicolon-separated list of key–value pairs, and all options are passed directly to the underlying producer without validation by the Cloud.

For example, you can specify properties such as acks or compression.type by including them in this property.

Be careful what configuration options you set via this property. The Cloud does not consider ConsumerProperties a sensitive property and its value appears in logs.

See ConsumerProperties for a description of how custom client configuration properties are handled.

CData Cloud

PseudoColumns

Specifies the pseudocolumns to expose as table columns, expressed as a string in the format 'TableName=ColumnName;TableName=ColumnName'.

Data Type

string

Default Value

Remarks

This property allows you to define which pseudocolumns the Cloud exposes as table columns.

To specify individual pseudocolumns, use the following format:

Table1=Column1;Table1=Column2;Table2=Column3

To include all pseudocolumns for all tables use:

*=*

CData Cloud

ReadDuration

Specifies how long, in seconds, the provider waits for additional messages after a read operation begins.

Data Type

int

Default Value

Remarks

This property determines the amount of time the Cloud continues polling Apache Kafka for new messages once it has started reading from a topic. If new messages arrive before the duration expires, they are included in the result set. When the duration elapses with no new messages, the read operation completes.

CData Cloud

RowScanDepth

Specifies the maximum number of messages that the provider scans to determine the columns and data types for a topic.

Data Type

string

Default Value

"100"

Remarks

The Cloud inspects message values to infer a topic’s schema when using a type detection scheme that requires scanning. This property limits how many messages are read during that process.

Higher values allow the Cloud to discover more fields and infer more accurate data types, but may increase startup time for the query. Lower values reduce scanning time, but may cause some fields or data types to be missed if they do not appear within the scanned messages.

CData Cloud

SASLOAuthExtensions

Specifies the extension values to send with OAuth auth schemes.

Data Type

string

Default Value

Remarks

When authenticating using OAuth, the Cloud obtains an OAuth access token according to the chosen AuthScheme. The Cloud uses the Kafka native library to authenticates to the broker using SASL OAUTHBEARER. The OAUTHBEARER mechanism sends the access token along with an optional list of extension settings.

This property accepts a semicolon-separated list of key-value pairs. These options are mapped to extension settings during OAUTHBEARER authentication. For example, clusterId=1;computePool=primary sets the clusterId and computePool extension settings.

Extension settings are vendor-defined, you should refer to your provider's documentation since individual managed Kafka providers may allow (or even require) specific extensions. Providers typically document these settings as JAAS configurations since the Kafka client library is configured with this format. JAAS configurations with extension settings prefix them with extension_. To provide the following configuration to the Cloud, set SASLOAuthExtensions to logicalCluster=clust-123;identityPoolId=mypool.

sasl.jaas.config= \
  org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \
    clientId='...'
    scope='...'
    clientSecret='...'
    extension_logicalCluster='clust-123';
    extension_identityPoolId='mypool';

This option only applies to the following AuthScheme configurations. Other authentication schemes ignore this property because they do not use OAUTHBEARER.

KafkaOAuthClient
AzureAD
AzureMSI
AzureServicePrincipal
AzureServicePrincipalCert
OAuthJWT
GCPInstanceAccount
AWSWorkloadIdentity

CData Cloud

SchemaRegistryOnly

Specifies whether the provider connects only to the schema registry.

Data Type

bool

Default Value

false

Remarks

When set to true, the Cloud disables all broker connections and operates only against the schema registry. In this mode, the Cloud does not perform any operations that requires access to the Apache Kafka broker. Attempting any of the following operations in registry-only mode results in an error:

SELECT queries (excluding system tables)
INSERT statements
CREATE TABLE statements
CommitOffset
ProduceMessage

Registry-only mode requires the following configuration settings:

TypeDetectionScheme must be set to SchemaRegistry. Any other value requires a broker connection to list tables.
ValidateRegistryTopics must be set to false. When validation is enabled, the Cloud compares broker topics with registry subjects, which requires connecting to the Apache Kafka broker.

CData Cloud

SerializationFormat

Specifies how to serialize/deserialize message contents.

Possible Values

NONE, AUTO, JSON, CSV, CSV_WITH_HEADERS, XML, AVRO, LONG, INTEGER, FLOAT, DOUBLE, STRING

Data Type

string

Default Value

"AUTO"

Remarks

The Cloud uses this property differently based on the value of TypeDetectionScheme. See Extracting Metadata From Topics for details on how these properties interact.

Primitive and Complex Formats

This section applies only when TypeDetectionScheme is set to SchemaRegistry, None, or RowScan modes. MessageOnly always reports the message as a single column regardless of the format. The only difference is the column type.

The Cloud supports two different kinds of formats: primitive formats and complex formats. Primitive formats are reported in a single column called Message. The primitive formats use encodings that are compatible with the kafka-clients and Confluent.Kafka libraries.

Avro, CSV, CSV_WITH_HEADERS, XML, and JSON are all complex formats. The Cloud parses these formats into one or more columns, flattening nested Avro, XML, and JSON values as necessary.

Auto is also a complex format, but the exact data format is determined at runtime. The Cloud determines whether a value is Avro, CSV, XML, or JSON by looking for either a specific header (the Avro OBJ header) or specific characters. If none of these methods succeed the Cloud assumes the value is CSV.

Available formats:


Format	Description
NONE	Message is always BASE64 encoded on both the consume and produce operations.
AUTO	Attempts to automatically detect the current topic's serialization format. See Extracting Metadata From Topics for more information.
JSON	Message is serialized using the JSON format.
CSV	Message is serialized using the CSV format.
CSV_WITH_HEADERS	Message is serialized using the CSV format with a separate header line before the data. This option only applies to messages created using INSERT. In SELECT, it behaves the same as CSV.
XML	Message is serialized using the XML format.
AVRO	Message is serialized using the Avro format.
LONG	Message is serialized as a 64-bit big-endian integer.
INTEGER	Message is serialized as a 32-bit big-endian integer.
FLOAT	Message is serialized as a 32-bit floating-point number.
DOUBLE	Message is serialized as a 64-bit floating-point number.
STRING	Message is serialized as text. By default the Cloud uses UTF-8, but setting Charset overrides this.

CData Cloud

ThrowsKeyNotFound

Specifies whether or not throws an exception if there is no rows updated.

Data Type

bool

Default Value

false

Remarks

Specifies whether or not throws an exception if there is no rows updated.

CData Cloud

Timeout

Specifies the maximum time, in seconds, that the provider waits for a server response before throwing a timeout error.

Data Type

int

Default Value

Remarks

The timeout applies to each individual communication with the server rather than the entire query or operation. For example, a query could continue running beyond 60 seconds if each paging call completes within the timeout limit.

Timeout is set to 60 seconds by default. To disable timeouts, set this property to 0.

Disabling the timeout allows operations to run indefinitely until they succeed or fail due to other conditions such as server-side timeouts, network interruptions, or resource limits on the server.

Note: Use this property cautiously to avoid long-running operations that could degrade performance or result in unresponsive behavior.

CData Cloud

TypeDetectionScheme

Specifies how the provider determines the available fields and data types for each topic.

Possible Values

None, RowScan, SchemaRegistry, SchemaRegistryAggregate, MessageOnly

Data Type

string

Default Value

"None"

Remarks

This property controls whether the Cloud reads message data, uses schema registry information, or combines multiple registry versions to detect columns. Different options may require access to the schema registry or may scan message data directly. See Extracting Metadata From Topics for more information on how this property interacts with different values of SerializationFormat, RegistryService, and MessageKeyType.

CData Cloud

UseConfluentAvroFormat

Specifies how Avro data should be formatted during an INSERT.

Data Type

bool

Default Value

false

Remarks

When set to false, the Cloud writes Avro data using standard Avro file blocks as defined in the Avro specification. This format allows multiple rows to be written into a single Kafka message and is more compact, but it is not compatible with Confluent tools or Confluent schema validation.

When set to true, the Cloud writes each row as a separate message using the Confluent Avro format. Enable this option if you rely on Confluent schema validation or need compatibility with Confluent serialization libraries.

This option cannot be enabled unless RegistryURL is set and points to a Confluent schema registry. AWS Glue registries do not support schema IDs, which are required by the Confluent Avro format.

CData Cloud

ValidateRegistryTopics

Specifies whether or not to validate schema registry topics against the Apache Kafka broker. Only has an effect when TypeDetectionScheme =SchemaRegistry.

Data Type

bool

Default Value

true

Remarks

Schema registries can include metadata for topics that cannot be accessed in Kafka. This can happen because the topic doesn't exist on the broker. It is also possible that the principal the connection is authenticated to does not have access to the topic.

By default, the Cloud will get a list of schemas from the registry and then filter out any that the broker does not report. All the remaining valid topics are exposed as tables. You can disable this behavior by setting this option to false. This will report all schemas in the registry as tables regardless of whether they are accessible on the broker.

CData Cloud

WorkloadPoolId

The ID of your Workload Identity Federation pool.

Data Type

string

Default Value

Remarks

The ID of your Workload Identity Federation pool.

CData Cloud

WorkloadProjectId

The ID of the Google Cloud project that hosts your Workload Identity Federation pool.

Data Type

string

Default Value

Remarks

The ID of the Google Cloud project that hosts your Workload Identity Federation pool.

CData Cloud

WorkloadProviderId

The ID of your Workload Identity Federation pool provider.

Data Type

string

Default Value

Remarks

The ID of your Workload Identity Federation pool provider.

CData Cloud

Third Party Copyrights

LZMA from 7Zip LZMA SDK

LZMA SDK is placed in the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or distribute the original LZMA SDK code, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.

LZMA2 from XZ SDK

Version 1.9 and older are in the public domain.

Xamarin.Forms

Xamarin SDK

The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

NSIS 3.10

Copyright (C) 1999-2025 Contributors THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS COMMON PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT.

1. DEFINITIONS

"Contribution" means:

a) in the case of the initial Contributor, the initial code and documentation distributed under this Agreement, and b) in the case of each subsequent Contributor:

i) changes to the Program, and

ii) additions to the Program;

where such changes and/or additions to the Program originate from and are distributed by that particular Contributor. A Contribution 'originates' from a Contributor if it was added to the Program by such Contributor itself or anyone acting on such Contributor's behalf. Contributions do not include additions to the Program which: (i) are separate modules of software distributed in conjunction with the Program under their own license agreement, and (ii) are not derivative works of the Program.

"Contributor" means any person or entity that distributes the Program.

"Licensed Patents " mean patent claims licensable by a Contributor which are necessarily infringed by the use or sale of its Contribution alone or when combined with the Program.

"Program" means the Contributions distributed in accordance with this Agreement.

"Recipient" means anyone who receives the Program under this Agreement, including all Contributors.

2. GRANT OF RIGHTS

a) Subject to the terms of this Agreement, each Contributor hereby grants Recipient a non-exclusive, worldwide, royalty-free copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, distribute and sublicense the Contribution of such Contributor, if any, and such derivative works, in source code and object code form.

b) Subject to the terms of this Agreement, each Contributor hereby grants Recipient a non-exclusive, worldwide, royalty-free patent license under Licensed Patents to make, use, sell, offer to sell, import and otherwise transfer the Contribution of such Contributor, if any, in source code and object code form. This patent license shall apply to the combination of the Contribution and the Program if, at the time the Contribution is added by the Contributor, such addition of the Contribution causes such combination to be covered by the Licensed Patents. The patent license shall not apply to any other combinations which include the Contribution. No hardware per se is licensed hereunder.

c) Recipient understands that although each Contributor grants the licenses to its Contributions set forth herein, no assurances are provided by any Contributor that the Program does not infringe the patent or other intellectual property rights of any other entity. Each Contributor disclaims any liability to Recipient for claims brought by any other entity based on infringement of intellectual property rights or otherwise. As a condition to exercising the rights and licenses granted hereunder, each Recipient hereby assumes sole responsibility to secure any other intellectual property rights needed, if any. For example, if a third party patent license is required to allow Recipient to distribute the Program, it is Recipient's responsibility to acquire that license before distributing the Program.

d) Each Contributor represents that to its knowledge it has sufficient copyright rights in its Contribution, if any, to grant the copyright license set forth in this Agreement.

3. REQUIREMENTS

A Contributor may choose to distribute the Program in object code form under its own license agreement, provided that:

a) it complies with the terms and conditions of this Agreement; and

b) its license agreement:

i) effectively disclaims on behalf of all Contributors all warranties and conditions, express and implied, including warranties or conditions of title and non-infringement, and implied warranties or conditions of merchantability and fitness for a particular purpose;

ii) effectively excludes on behalf of all Contributors all liability for damages, including direct, indirect, special, incidental and consequential damages, such as lost profits;

iii) states that any provisions which differ from this Agreement are offered by that Contributor alone and not by any other party; and

iv) states that source code for the Program is available from such Contributor, and informs licensees how to obtain it in a reasonable manner on or through a medium customarily used for software exchange.

When the Program is made available in source code form:

a) it must be made available under this Agreement; and

b) a copy of this Agreement must be included with each copy of the Program.

Contributors may not remove or alter any copyright notices contained within the Program.

Each Contributor must identify itself as the originator of its Contribution, if any, in a manner that reasonably allows subsequent Recipients to identify the originator of the Contribution.

4. COMMERCIAL DISTRIBUTION

Commercial distributors of software may accept certain responsibilities with respect to end users, business partners and the like. While this license is intended to facilitate the commercial use of the Program, the Contributor who includes the Program in a commercial product offering should do so in a manner which does not create potential liability for other Contributors. Therefore, if a Contributor includes the Program in a commercial product offering, such Contributor ("Commercial Contributor") hereby agrees to defend and indemnify every other Contributor ("Indemnified Contributor") against any losses, damages and costs (collectively "Losses") arising from claims, lawsuits and other legal actions brought by a third party against the Indemnified Contributor to the extent caused by the acts or omissions of such Commercial Contributor in connection with its distribution of the Program in a commercial product offering. The obligations in this section do not apply to any claims or Losses relating to any actual or alleged intellectual property infringement. In order to qualify, an Indemnified Contributor must: a) promptly notify the Commercial Contributor in writing of such claim, and b) allow the Commercial Contributor to control, and cooperate with the Commercial Contributor in, the defense and any related settlement negotiations. The Indemnified Contributor may participate in any such claim at its own expense.

For example, a Contributor might include the Program in a commercial product offering, Product X. That Contributor is then a Commercial Contributor. If that Commercial Contributor then makes performance claims, or offers warranties related to Product X, those performance claims and warranties are such Commercial Contributor's responsibility alone. Under this section, the Commercial Contributor would have to defend claims against the other Contributors related to those performance claims and warranties, and if a court requires any other Contributor to pay any damages as a result, the Commercial Contributor must pay those damages.

5. NO WARRANTY

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely responsible for determining the appropriateness of using and distributing the Program and assumes all risks associated with its exercise of rights under this Agreement, including but not limited to the risks and costs of program errors, compliance with applicable laws, damage to or loss of data, programs or equipment, and unavailability or interruption of operations.

6. DISCLAIMER OF LIABILITY

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

7. GENERAL

If any provision of this Agreement is invalid or unenforceable under applicable law, it shall not affect the validity or enforceability of the remainder of the terms of this Agreement, and without further action by the parties hereto, such provision shall be reformed to the minimum extent necessary to make such provision valid and enforceable.

If Recipient institutes patent litigation against a Contributor with respect to a patent applicable to software (including a cross-claim or counterclaim in a lawsuit), then any patent licenses granted by that Contributor to such Recipient under this Agreement shall terminate as of the date such litigation is filed. In addition, if Recipient institutes patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Program itself (excluding combinations of the Program with other software or hardware) infringes such Recipient's patent(s), then such Recipient's rights granted under Section 2(b) shall terminate as of the date such litigation is filed.

All Recipient's rights under this Agreement shall terminate if it fails to comply with any of the material terms or conditions of this Agreement and does not cure such failure in a reasonable period of time after becoming aware of such noncompliance. If all Recipient's rights under this Agreement terminate, Recipient agrees to cease use and distribution of the Program as soon as reasonably practicable. However, Recipient's obligations under this Agreement and any licenses granted by Recipient relating to the Program shall continue and survive.

Everyone is permitted to copy and distribute copies of this Agreement, but in order to avoid inconsistency the Agreement is copyrighted and may only be modified in the following manner. The Agreement Steward reserves the right to publish new versions (including revisions) of this Agreement from time to time. No one other than the Agreement Steward has the right to modify this Agreement. IBM is the initial Agreement Steward. IBM may assign the responsibility to serve as the Agreement Steward to a suitable separate entity. Each new version of the Agreement will be given a distinguishing version number. The Program (including Contributions) may always be distributed subject to the version of the Agreement under which it was received. In addition, after a new version of the Agreement is published, Contributor may elect to distribute the Program (including its Contributions) under the new version. Except as expressly stated in Sections 2(a) and 2(b) above, Recipient receives no rights or licenses to the intellectual property of any Contributor under this Agreement, whether expressly, by implication, estoppel or otherwise. All rights in the Program not expressly granted under this Agreement are reserved.

This Agreement is governed by the laws of the State of New York and the intellectual property laws of the United States of America. No party to this Agreement will bring a legal action under this Agreement more than one year after the cause of action arose. Each party waives its rights to a jury trial in any resulting litigation.

Build 25.0.9539