Automatic Schema Discovery
The driver automatically infers a relational schema by retrieving the mapping of the Elasticsearch type. The columns and data types are generated from the retrieved mapping.
Detecting Arrays
Any field within Elasticsearch can be an array of values, but this is not explicitly defined within the mapping. To account for this, the driver will query the data to detect if any fields contain arrays. The number of Elasticsearch documents retrieved during this array scanning is based on the RowScanDepth property.
Elasticsearch nested types are special types that denote an array of objects and thus will always be treated as such when generating the metadata.
Detecting Columns
The columns identified during the discovery process depend on the FlattenArrays and FlattenObjects properties.
Example Data Set
To provide an example of how these options work, consider the following mapping (where 'insured' is the name of the table):
{ "insured": { "properties": { "name": { "type":"string" }, "address": { "street": { "type":"string" }, "city": { "type":"string" }, "state": { "type":"string" } }, "insured_ages": { "type": "integer" }, "vehicles": { "type": "nested", "properties": { "year": { "type":"integer" }, "make": { "type":"string" }, "model": { "type":"string" }, "body_style" { "type": "string" } } } } } }
Also consider the following example data for the above mapping:
{ "_source": { "name": "John Smith", "address": { "street": "Main Street", "city": "Chapel Hill", "state": "NC" }, "insured_ages": [ 17, 43, 45 ], "vehicles": [ { "year": 2015, "make": "Dodge", "model": "RAM 1500", "body_style": "TK" }, { "year": 2015, "make": "Suzuki", "model": "V-Strom 650 XT", "body_style": "MC" }, { "year": 2012, "make": "Honda", "model": "Accord", "body_style": "4D" } ] } }
Using FlattenObjects
If FlattenObjects is set, all nested objects will be flattened into a series of columns. The above example will be represented by the following columns:
Column Name | Data Type | Example Value |
name | String | John Smith |
address.street | String | Main Street |
address.city | String | Chapel Hill |
address.state | String | NC |
insured_ages | String | [ 17, 43, 45 ] |
vehicles | String | [ { "year": "2015", "make": "Dodge", ... }, { "year": "2015", "make": "Suzuki", ... }, { "year": "2012", "make": "Honda", ... } ] |
If FlattenObjects is not set, then the address.street, address.city, and address.state columns will not be broken apart. The address column of type string will instead represent the entire object.
Its value would be the following:
{street: "Main Street", city: "Chapel Hill", state: "NC"}See JSON Functions for more details on working with JSON aggregates.
Using FlattenArrays
The FlattenArrays property can be used to flatten array values into columns of their own. This is only recommended for arrays that are expected to be short. It is best to leave unbounded arrays as they are and piece out the data for them as needed using JSON Functions.
Note: Only the top-most array will be flattened. Any subarrays will be represented as the entire array.
The FlattenArrays property can be set to 3 to represent the arrays in the example above as follows (this example is with FlattenObjects not set):
Column Name | Data Type | Example Value |
insured_ages | String | [ 17, 43, 45 ] |
insured_ages.0 | Integer | 17 |
insured_ages.1 | Integer | 43 |
insured_ages.2 | Integer | 45 |
vehicles | String | [ { "year": "2015", "make": "Dodge", ... }, { "year": "2015", "make": "Suzuki", ... }, { "year": "2012", "make": "Honda", ... } ] |
vehicles.0 | String | { "year": "2015", "make": "Dodge", "model": "RAM 1500", "body_style": "TK" } |
vehicles.1 | String | { "year": "2015", "make": "Suzuki", "model": "V-Strom 650 XT", "body_style": "MC" } |
vehicles.2 | String | { "year": "2012", "make": "Honda", "model": "Accord", "body_style": "4D" } |
Using Both FlattenObjects and FlattenArrays
If FlattenObjects is set along with FlattenArrays (set to 1 for brevity), the vehicles field will be represented as follows:
Column Name | Data Type | Example Value |
vehicles | String | [ { "year": "2015", "make": "Dodge", ... }, { "year": "2015", "make": "Suzuki", ... }, { "year": "2012", "make": "Honda", ... } ] |
vehicles.0.year | String | 2015 |
vehicles.0.make | String | Dodge |
vehicles.0.model | String | RAM 1500 |
vehicles.0.body_style | String | TK |