Automatic Schema Discovery
By default, the driver automatically infers a relational schema by inspecting the Avro data. This section describes the connection properties available to configure these dynamic schemas.
Detecting Columns
The columns identified during the discovery process depend on the FlattenArrays and FlattenObjects properties. If FlattenObjects is set (this is the default), nested objects will be flattened into a series of columns.
Example Data Set
To provide an example of how these options work, consider the following schema:
{ "type" : "record", "name" : "Root", "fields" : [ { "name" : "id", "type" : [ "null", "long" ] }, { "name" : "name", "type" : [ "null", "string" ] }, { "name" : "annual_revenue", "type" : [ "null", "long" ] }, { "name" : "offices", "type" : { "type" : "array", "items" : "string" } }, { "name" : "address", "type" : [ "null", { "type" : "record", "name" : "Address", "namespace" : "root", "fields" : [ { "name" : "city", "type" : [ "null", "string" ] }, { "name" : "state", "type" : [ "null", "string" ] }, { "name" : "street", "type" : [ "null", "string" ] } ] } ] }] }
Also consider the following example data for the above schema:
{ "id": 12, "name": "Lohia Manufacturers Inc.", "annual_revenue": 35600000, "offices": [ "Chapel Hill", "London", "New York" ], "address": { "city": "Chapel Hill", "state": "NC", "street": "Main Street" } }
Using FlattenObjects
If FlattenObjects is set, all nested objects will be flattened into a series of columns. The above example will be represented by the following columns:
Column Name | Data Type | Example Value |
id | Integer | 12 |
name | String | Lohia Manufacturers Inc. |
address.street | String | Main Street |
address.city | String | Chapel Hill |
address.state | String | NC |
offices | String | ["Chapel Hill", "London", "New York"] |
annual_revenue | Double | 35,600,000 |
If FlattenObjects is not set, then the address.street, address.city, and address.state columns will not be broken apart. The address column of type string will instead represent the entire object. Its value would be the following:
{street: "Main Street", city: "Chapel Hill", state: "NC"}
Using FlattenArrays
The FlattenArrays property can be used to flatten array values into columns of their own. This is only recommended for arrays that are expected to be short, for example the coordinates below:
"coord": [ -73.856077, 40.848447 ]The FlattenArrays property can be set to 2 to represent the array above as follows:
Column Name | Data Type | Example Value |
coord.0 | Float | -73.856077 |
coord.1 | Float | 40.848447 |