CData Python Connector for Avro

Build 24.0.9060

Automatic Schema Discovery

By default, the connector automatically infers a relational schema by inspecting the Avro data. This section describes the connection properties available to configure these dynamic schemas.

Detecting Columns

The columns identified during the discovery process depend on the FlattenArrays and FlattenObjects properties. If FlattenObjects is set (this is the default), nested objects will be flattened into a series of columns.

Example Data Set

To provide an example of how these options work, consider the following schema:

{
  "type" : "record",
  "name" : "Root",
  "fields" : [ {
    "name" : "id",
    "type" : [ "null", "long" ]
  }, {
    "name" : "name",
    "type" : [ "null", "string" ]
  }, {
    "name" : "annual_revenue",
    "type" : [ "null", "long" ]
  }, {
    "name" : "offices",
    "type" : {
      "type" : "array",
      "items" : "string"
    }
  }, {
    "name" : "address",
    "type" : [ "null", {
      "type" : "record",
      "name" : "Address",
      "namespace" : "root",
      "fields" : [ {
        "name" : "city",
        "type" : [ "null", "string" ]
      }, {
        "name" : "state",
        "type" : [ "null", "string" ]
      }, {
        "name" : "street",
        "type" : [ "null", "string" ]
      } ]
    } ]
  }]
}

Also consider the following example data for the above schema:

{
  "id": 12,
  "name": "Lohia Manufacturers Inc.",
  "annual_revenue": 35600000,
  "offices": [
    "Chapel Hill",
    "London",
    "New York"
  ],
  "address": {
    "city": "Chapel Hill",
    "state": "NC",
    "street": "Main Street"
  }
}

Using FlattenObjects

If FlattenObjects is set, all nested objects will be flattened into a series of columns. The above example will be represented by the following columns:

Column NameData TypeExample Value
idInteger12
nameStringLohia Manufacturers Inc.
address.streetStringMain Street
address.cityStringChapel Hill
address.stateStringNC
officesString["Chapel Hill", "London", "New York"]
annual_revenueDouble35,600,000

If FlattenObjects is not set, then the address.street, address.city, and address.state columns will not be broken apart. The address column of type string will instead represent the entire object. Its value would be the following:

    {street: "Main Street", city: "Chapel Hill", state: "NC"}
  

Using FlattenArrays

The FlattenArrays property can be used to flatten array values into columns of their own. This is only recommended for arrays that are expected to be short, for example the coordinates below:

"coord": [ -73.856077, 40.848447 ]
The FlattenArrays property can be set to 2 to represent the array above as follows:

Column NameData TypeExample Value
coord.0Float-73.856077
coord.1Float40.848447

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 24.0.9060