Apache Kafka

Version 24.3.9110


Apache Kafka


You can use the Apache Kafka connector from the CData Sync application to capture data from Apache Kafka and move it to any supported destination. To do so, you need to add the connector, authenticate to the connector, and complete your connection.

Note: With this connector, you can authenticate either to Kafka or to Azure Event Hubs.

Add the Apache Kafka Connector

To enable Sync to use data from Apache Kafka, you first must add the connector, as follows:

  1. Open the Connections page of the Sync dashboard.

  2. Click Add Connection to open the Select Connectors page.

  3. Click the Sources tab and locate the Apache Kafka row.

  4. Click the Configure Connection icon at the end of that row to open the New Connection page. If the Configure Connection icon is not available, click the Download Connector icon to install the Apache Kafka connector. For more information about installing new connectors, see Connections.

Authenticate to Apache Kafka

After you add the connector, you need to set the required properties.

  • Connection Name - Enter a connection name of your choice.

  • Bootstrap Servers - Enter the address of the Apache Kafka bootstrap servers to which you want to connect.

CData Sync supports authenticating to Apache Kafka in several ways. Select your authentication method below to proceed to the relevant section that contains the authentication details.

None

To connect without authentication, select None for Auth Scheme. No additional properties are required.

Plain

To connect with a plain-text login, specify these properties:

  • Auth Scheme - Select Plain.

  • User - Enter the username that you use to authenticate to your Apache Kafka account.

  • Password - Enter the password that you use to authenticate to your Apache Kafka account.

SCRAM

To connect with SCRAM (SHA-256) credentials, specify these properties:

  • Auth Scheme - Select SCRAM.

  • User - Enter the username that you use to authenticate to your Kafka account.

  • Password - Enter the password that you use to authenticate to your Kafka account.

SCRAM-SHA-512

To connect with the SCRAM (SHA-512) credentials, specify these properties:

  • Auth Scheme - Select SCRAM-SHA-512.

  • User - Enter the username that you use to authenticate to your Kafka account.

  • Password - Enter the password that you use to authenticate to your Kafka account.

Kerberos

To connect with Kerberos, specify these properties:

  • Auth Scheme - Select Kerberos.

  • Kerberos SPN - Enter the service principal name (SPN) for the Kerberos domain controller.

  • Kerberos Service Name - Enter the name of the Kerberos service with which you want to authenticate.

  • Kerberos Keytab File (optional) - Enter the keytab file that contains your pairs of the Kerberos principals and encrypted keys.

  • Use Kerberos Ticket Cache (optional) - Select True to use a ticket cache with the logged-in user instead of a keytab file. The default value is False.

Authenticate to Azure Event Hubs

After you add the connector, you need to set the following required properties to connect to Azure Event Hubs.

  • Connection Name - Enter a connection name of your choice.

  • Bootstrap Servers - Enter the address of the Apache Kafka bootstrap servers to which you want to connect.

CData Sync supports authenticating to Apache Kafka in several ways. Select your authentication method below to proceed to the relevant section that contains the authentication details.

SSL Certificate

To connect with a Secure Sockets Layer (SSL) client certificate, specify these properties:

  • Auth Scheme - Select SSLCertificate.

  • SSL Client Cert - Enter the SSL client certificate that is used to validate to the Apache Kafka broker.

  • SSL Client Cert Type - Select the format of the SSL client certificate that is used to connect to the Apache Kafka broker:

    • JKSFILE
    • PFXFILE
    • PEMKEY_FILE (default)
    • PEMKEY_BLOB
  • SSL Client Cert Password (optional) - Enter the password that is used to decrypt the SSL client certificate.

Azure Active Directory

To connect with Azure Active Directory (AD) credentials, specify the following properties:

  • Azure Tenant - Enter the Microsoft Online tenant that is used to access data. If you do not specify a tenant, Sync uses the default tenant.

  • OAuth Client Id - Enter the client Id that you were assigned when you registered your application with an OAuth authorization server.

  • OAuth Client Secret - Enter the client secret that you were assigned when you registered your application with an OAuth authorization server.

Azure Managed Service Identity

To leverage Azure Managed Service Identity (MSI) when CData Sync is running on an Azure virtual machine, select Azure MSI for Auth Scheme. No additional properties are required.

Azure Service Principal

To connect with an Azure service principal and client secret, set the following properties:

  • Auth Scheme - Select AzureServicePrincipal.

  • Azure Tenant - Enter the Microsoft Online tenant to which you want to connect.

  • OAuth Client Id - Enter the client Id that you were assigned when you registered your application with an OAuth authorization server.

  • OAuth Client Secret - Enter the client secret that you were assigned when you registered your application with an OAuth authorization server.

To obtain the OAuth client Id and client secret for your application:

  1. Log in to the Azure portal.

  2. In the left navigation pane, select All services. Then, search for and select App registrations.

  3. Click New registrations.

  4. Enter an application name and select Any Azure AD Directory - Multi Tenant. Set the redirect URI to the value that is specified for CallbackURL.

  5. After you create the application, copy the application (client) Id value that is displayed in the Overview section. Use this value as the OAuth client Id.

  6. Navigate to the Certificates & Secrets section and select New Client Secret for the application.

  7. Specify the duration and save the client secret. After you save it, the key value is displayed.

  8. Copy this value because it is displayed only once. You will use this value as the OAuth client secret.

  9. On the Authentication tab, make sure to select Access tokens (used for implicit flows).

Azure Service Principal Certificate

To connect with an Azure service principal and client certificate, set the following properties:

  • Auth Scheme - Select AzureServicePrincipalCert.

  • Azure Tenant - Enter the Microsoft Online tenant to which you want to connect.

  • OAuth Client Id - Enter the client Id that you were assigned when you registered your application with an OAuth authorization server.

  • OAuth JWT Cert – Enter your Java web tokens (JWT) certificate store.

  • OAuth JWT Cert Type – Enter the type of key store that contains your JWT Certificate. The default type is PEMKEY_BLOB.

  • OAuth JWT Cert Password (optional) – Enter the password for your OAuth JWT certificate.

  • OAuth JWT Cert Subject (optional) – Enter the subject of your OAuth JWT certificate.

To obtain the OAuth certificate for your application:

  1. Log in to the Azure portal.

  2. In the left navigation pane, select All services. Then, search for and select App registrations.

  3. Click New registrations.

  4. Enter an application name and select Any Azure AD Directory - Multi Tenant. Set the redirect URI to the value that is specified for CallbackURL.

  5. After you create the application, copy the application (client) Id value that is displayed in the Overview section. Use this value as the OAuth client Id.

  6. Navigate to the Certificates & Secrets section and select Upload certificate. Then, select the certificate to upload from your local machine.

  7. Specify the duration and save the client secret. After you save it, the key value is displayed.

  8. Copy this value because it is displayed only once. You will use this value as the OAuth client secret.

  9. On the Authentication tab, make sure to select Access tokens (used for implicit flows).

Extracting Metadata from Topics

Reads in Apache Kafka do not have a natural stopping point. To avoid perpetual Read operations, items are read until either the ReadDuration property or the Timeout property expires. By default, ReadDuration is set to 30 seconds.

The Kafka driver models topics as tables and messages as rows, and it facilitates this behavior in two ways:

  • For services that contain a schema registry (for example, Confluent and instances hosted by Amazon Web Services), the schema is read directly from the schema registry.

  • For services that do not contain a schema registry, the driver infers the schema.

Schema Registry

The schema registry contains a list of topics that have registered schemas. The list of tables and columns are simply read directly from the schema registry.

To connect to a service with a schema registry, specify the following properties:

  • Bootstrap Servers - Enter the server (host name or IP address) and port (in the format Server:Port) of the Apache Kafka bootstrap servers.

  • Type Detection Scheme - Select SchemaRegistry. This scheme determines the Schema Registry API and uses a list of predefined AVRO schemas. For SchemaRegistry, specify the following properties:

    • Registry Url - Enter the URL to the server for the schema registry. When you specify this property, Sync reads the Apache Kafka schema from the server.

    • Registry Service - Select the Schema Registry service that you want to use for working with topic schemas.

      • Confluent (default)

      • AWSGlue

    • Registry Auth Scheme - Select the scheme that you want to use to authenticate to the schema registry.

      • None (default) - This scheme specifies that no authentication is used.

      • Basic - For this scheme, specify the following properties:

        Registry User - Enter the username that you use to authenticate with the server that you specified earlier for Registry Url.

        Registry Password - Enter the password that you use to authenticate with the server that you specified earlier for Registry Url.

      • SSLCertificate - This scheme specifies that SSL client-certificate authentication should be used. For this setting, you must specify the following:

        • Registry Client Cert - Enter the TLS/SSL client certificate store for SSL client authentication (two-way SSL) with the schema registry.

        • Registry Client Cert Password (optional) - Enter the password for your TLS/SSL client certificate.

        • Registry Client Cert Subject (optional) - Enter the subject of your TLS/SSL client certificate.

Confluent Schema Registry

When you connect to Confluent Cloud, the registry URL corresponds to the Schema Registry endpoint value that is located in Schemas > Schema Registry > Instructions.

The Confluent schema registry supports several authentication options. Typically, Confluent Cloud deployments require that you set the Registry Auth Scheme property to Basic, along with a registry user and registry password. To find your user and password, navigate to Schemas > Schema Registry > API Access and locate the access-key and secret-key values.

On-premises deployments might not require authentication. In these configurations, you should set Registry Auth Scheme to None. These deployments might require SSL client certificates also. For that, you need to set the SSL Certificate registry auth scheme as well as the Registry Client Cert and Registry Client Cert Type options.

Amazon Web Services (AWS) Glue Schema Registry

When you connect to AWS Glue, the registry URL corresponds to the Amazon Resources Name (ARN) value of the registry.

The AWS Glue schema registry only supports the Basic registry auth scheme. You should set Registry User and Registry Password, respectively, to the access key and secret key of a user that has access to the registry.

No-Schema Registry

To connect to a service without a schema registry, specify the following properties:

  • Bootstrap Servers - Enter the server (host name or IP address) and port (in the format Server:Port) of the Apache Kafka bootstrap servers.

  • Type Detection Scheme - Select Row Scan. This scheme scans rows in order to determine the data type heuristically.

For schema discovery, the Sync application attempts to detect the format (AVRO/JSON/XML/CSV) automatically. You can also set the format explicitly with the Serialization Format property.

After Sync reads the format, it analyzes the rows from the topic. If you want increased accuracy, set a higher value for the Row Scan Depth property. Be aware, though, that setting a higher row-scan depth might decrease performance.

Then, Sync begins reading at the current offset, which you can configure with the Offset Reset Strategy property). From this point, future SELECT statements will start from the beginning.

Sync completes schema discovery by performing deserialization (based on the determined serialization format).

Complete Your Connection

To complete your connection:

  1. Specify the following properties:

    • Type Detection Scheme - Select a detection-scheme type to specify how Sync scans data to determine the fields and data types for the bucket.

      • RowScan - See details in No-Schema Registry.

      • SchemaRegistry - See details in Schema Registry.

      • MessageOnly - Pushes all information as a single aggregate value on a column named Message.

    • Use SSL (optional) - Specify whether you want to use the Secure Sockets Layer (SSL) protocol. The default value is False.

  2. Define advanced connection settings on the Advanced tab. (In most cases, though, you should not need these settings.)

  3. If you authenticate with AzureAD, click Connect to Apache Kafka to connect to your Apache Kafka account.

  4. Click Create & Test to create your connection.

More Information

For more information about interactions between CData Sync and Apache Kafka, see Apache Kafka Connector for CData Sync.