ADO.NET Provider for Apache Kafka

Build 25.0.9434

SchemaMergeMode

How the provider exposes schemas with multiple versions.

Possible Values

None, Simple

Data Type

string

Default Value

"None"

Remarks

By default the provider uses SchemaMergeMode=None.

None

This mode only supports one version for schemas in the registry. It is normally the latest version, but you can change RegistryVersion to use a specific version number. The provider ignores the content of any message that does not match the schema for its topic. Reading the topic returns the message, but all of its data fields (fields other than Partition, Offset, and Timestamp) are reported as NULL.

Limitations

This mode supports both SELECT and INSERT queries into each topic. An INSERT always uses the version of the schema specified by RegistryVersion.

This mode supports all options for RegistryService.

Schema Confusion

For compatibility with previous versions, the provider does not enforce the schema ID included on messages when using RegistryService=Confluent. With SchemaMergeMode=None this ID is always ignored, but even with SchemaMergeMode=Simple the provider ignores the ID if it cannot find a matching schema. This may cause the provider to output field values under unexpected columns.

For example, consider the following two Avro schemas that store names and address details. The schemas are binary compatible: even though the field names differ, they have the same number of fields with the same types in the same order.

{
  "type": "record",
  "name": "personname",
  "fields": [
    { "name": "PersonID", "type": "int" },
    { "name": "LastName", "type": "string" },
    { "name": "FirstName", "type": "string" }
  ],
}
{
  "type": "record",
  "name": "personaddress",
  "fields": [
    { "name": "PersonID", "type": "int" },
    { "name": "Address", "type": "string" },
    { "name": "City", "type": "string" }
  ],
}

If you produce these messages to the topic using the personname schema, the provider may parse these messages using the personaddress schema. This happens if, for example, personname and personaddress are two versions of the same registry schema. The provider sees that personaddress is the latest version and uses it for this topic.

{"PersonID": 1, "LastName": "Smithers", "FirstName": "William"}
{"PersonID": 2, "LastName": "McAllister", "FirstName": "Amy"}

In that scenario, the provider outputs these results:

PersonID Address City
1 Smithers William
2 McAllister Amy

Simple

Setting SchemaMergeMode=Simple causes the provider to load all versions of each topic schema and merge them according to the following rules. These rules ensure that the provider produces NULL or a valid value for each column. If any rule fails, the provider fails the schema merge by logging an error and outputting a schema with no data fields.

Limitations

This mode supports only SELECT queries. The provider does not have a way to specify a specific version of a schema to use for INSERT queries. If you need to produce messages in this mode, use the ProduceMessage stored procedure.

This mode only supports RegistryService=Confluent. Messages produced with the Confluent libraries include the ID of the schema their data conforms to. The provider uses this to determine what schema to parse each message with.

If a message does not have an ID, or if the ID refers to a schema that does not match the topic name, the provider defaults to the latest schema. This may cause field values to appear in unexpected columns if the schemas are different but produce compatible output. See the Schema Confusion section above for a more detailed discussion of this issue.

Schema Validation Rules

If all versions of the schema are valid according to these rules, the provider includes every field from every version of the schema in the table.

  • Each field must have the same type across all versions where they appear. Fields may appear in some versions and not others. Those fields are reported as NULL when they are not present.
  • All versions must be Avro schemas.

During validation, the type of a field is the type that the provider uses to report the field, not its original Avro type. This means that two versions of a schema can have a field which in one version is an aggregate (array, map, ...) and another is a string. For example, the provider considers these two schemas compatible, but there is currently no way to tell whether the address field is JSON or just text.

{
    "type": "record"   
    "name": "person",
    "fields": [
        { "name": "address", "type": {"type": "array", "items": "string"}}
    ]
}

{
    "type": "record"   
    "name": "person",
    "fields": [
        { "name": "address", "type": "string" }
    ]
}

Remember that these rules are applied transitively. This means that two versions of the schema may be valid in isolation, but not when considering all versions of the schema. For example, consider a schema where v1 contains an integer amount field, v2 removes it, and v3 adds a decimal amount field. v1 and v2 are valid together because removing fields is allowed, and v2 and v3 are valid together because adding fields is allowed. However, all three versions combined violate the rules because the amount field changed type between v1 and v3.

For best results, we recommend enabling one of the transitive schema compatibility modes within the schema registry. The provider does not check the compatibility mode as part of its validation rules. However, setting a transitive schema compatibility mode prevents you from creating schemas that the provider cannot process.

Copyright (c) 2025 CData Software, Inc. - All rights reserved.
Build 25.0.9434