RegistryArraysAsChildTables
Specifies whether the provider exposes arrays from schema registry schemas as separate tables.
Data Type
bool
Default Value
false
Remarks
By default the provider flattens array data from Avro and JSON topics into JSON aggregates. This ensures that a topic's data is all available when reading a single table, which is the best fit for the Kafka data model. Many core Kafka features (such as offset commits and consumer groups) operate on the topic level. Retrieving a topic's data in one shot is the most reliable way for the provider to interact with Kafka.
Enabling this option splits topics that contain array data into multiple tables. When listing tables, the provider fetches the value schema (and key schema, if enabled by MessageKeyType) for each topic in the registry. It then discovers all array types within those schemas and exposes a separate table for each one.
For example, if the provider discovers a schema called user_metrics-value:
{
"type": "record",
"name": "usermetric",
"fields": [
{"name": "userid", "type": "long"},
{"name": "aliases", "type": "array", "items": "string"},
{
"name": "demographics",
"type": "array",
"items": {
"type": "record",
"name": "userdemo",
"fields": [
{ "name": "minAge", "type": "int" },
{ "name": "maxAge", "type": "int" },
{
"name": "possible_locations",
"type": "array",
"items": {
"type": "record",
"fields": [
{ "name": "city", "type": "string" },
{ "name": "state", "type": "string" },
{ "name": "zip", "type": "string" }
]
}
}
]
}
},
{
"name": "interests",
"type": "array",
"items": {
"type": "record",
"name": "userinterests",
"fields": [
{ "name": "interest", "type": "string" },
{ "name": "strength", "type": "int" }
]
}
}
]
}
It exposes the following tables:
- user__metrics. The extra underscore prevents conflicts with other tables that share portions of this topic's name. For example, another topic may be named user and have an array called metrics.
- user__metrics_value_demographics. A single underscore separates the topic name from the path compoennts, and path components from each other. The _value_ component prevents conflicts when a topic has a key and a value schema.
- user__metrics_value_demographics_possible__locations Arrays nested in records or in other arrays include multiple path components.
- user__metrics_value_interests
The user__metrics table exposes the queue metadata columns Partition, Offset, and Timestamp. The Partition and Offset columns are reported as a compound primary key. It also reports the data columns userid and aliases. Array fields are only exposed as columns when they contain non-record values.
The user__metrics_value_demographics table exposes the queue metadata columns Partition, Offset, and Timestamp. These have the same values as their parent row in the user__metrics table, and the Partition and Offset columns are reported as a foreign key referencing it. It reports the data columns maxAge and minAge. In addition, it reports a primary key column called ArrayIndex. Its value is the index in the array that this child table row was extracted from.
The user__metrics_value_demographics_possible__locations table exposes the queue metadata columns Partition, Offset, and Timestamp. These have the same values as their parent row in the user__metrics_value_demographics table. It also exposes the array index columns ArrayIndex and ParentArrayIndex. ParentArrayIndex contains the same value as ArrayIndex in the user__metrics_value_demographcis table, and is a foreign key that references that column. ArrayIndex is the primary key for this table. Finally, it exposes the data columns city, state, and zip.
While this configuration may be conveneint for some workloads, it comes with several restrictions as well:
- TypeDetectionScheme must be set to SchemaRegistry. Child tables are not supported in non-registry type detection modes or with registry aggregates.
- SchemaMergeMode must be set to None. The provider only uses one version of the schema from the registry.
- ExposeQueueMetadataColumns must be set to true. The provider uses these columns as primary keys within the top-level table and foreign keys within first-level child tables.
- You must take care not to commit read offsets after querying a parent table if you plan to query its child later. When AutoCommit is enabled or you invoke the CommitOffset procedure, Kafka stores the offset for the entire topic. This means that, if you read the parent table and commit the offset (explicitly or automatically), the new read position takes effect for any child tables as well. Querying a child table after this point does not return any data because Kafka committed the read offset to the end of the topic. Only commit offsets when you read all child tables that you intend to query.
- Querying with JOINs is not supported. If you want to correlate data between tables, you must query them separately and analyze them outside the provider.
- Performing INSERTs is not supported. If you want to write to topics in child table mode, you must use the ProduceMessage stored procedure.