Automatic Schema Discovery
The add-in automatically infers a relational schema by retrieving the mapping of the HarperDB type. The columns and data types are generated from the retrieved data based on the value of the RowScanDepth property.
Detecting Data Types
HarperDB is schemaless and allows unstructured data. To map this data into a relational table, the add-in will detect the data types by scanning the data for each column. The number of rows scanned is configurable via RowScanDepth.
Detecting Columns
Since HarperDB is schemaless, columns may contain JSON objects or arrays. The add-in enables special handling of these JSON structures and enables you to flatten them. Therefore, columns identified during the discovery process depend on the FlattenArrays and FlattenObjects properties.
Example Data Set
To provide an example of how these options work, consider the following sample data:
[ { "name": "John Smith", "address": { "street": "Main Street", "city": "Chapel Hill", "state": "NC" }, "insured_ages": [ 17, 43, 45 ], "vehicles": [ { "year": 2015, "make": "Dodge", "model": "RAM 1500", "body_style": "TK" }, { "year": 2015, "make": "Suzuki", "model": "V-Strom 650 XT", "body_style": "MC" }, { "year": 2012, "make": "Honda", "model": "Accord", "body_style": "4D" } ] } ]
Using FlattenObjects
If FlattenObjects is set, all nested objects will be flattened into a series of columns. The above example will be represented by the following columns:
Column Name | Data Type | Example Value |
name | String | John Smith |
address.street | String | Main Street |
address.city | String | Chapel Hill |
address.state | String | NC |
insured_ages | String | [ 17, 43, 45 ] |
vehicles | String | [ { "year": "2015", "make": "Dodge", ... }, { "year": "2015", "make": "Suzuki", ... }, { "year": "2012", "make": "Honda", ... } ] |
If FlattenObjects is not set, then the address.street, address.city, and address.state columns will not be broken apart. The address column of type string will instead represent the entire object.
Its value would be the following:
{street: "Main Street", city: "Chapel Hill", state: "NC"}See JSON Functions for more details on working with JSON aggregates.
Using FlattenArrays
The FlattenArrays property can be used to flatten array values into columns of their own. This is only recommended for arrays that are expected to be short. It is best to leave unbounded arrays as they are and piece out the data for them as needed using JSON Functions.
Note: Only the top-most array will be flattened. Any subarrays will be represented as the entire array.
The FlattenArrays property can be set to 3 to represent the arrays in the example above as follows (this example is with FlattenObjects not set):
Column Name | Data Type | Example Value |
insured_ages | String | [ 17, 43, 45 ] |
insured_ages.0 | Integer | 17 |
insured_ages.1 | Integer | 43 |
insured_ages.2 | Integer | 45 |
vehicles | String | [ { "year": "2015", "make": "Dodge", ... }, { "year": "2015", "make": "Suzuki", ... }, { "year": "2012", "make": "Honda", ... } ] |
vehicles.0 | String | { "year": "2015", "make": "Dodge", "model": "RAM 1500", "body_style": "TK" } |
vehicles.1 | String | { "year": "2015", "make": "Suzuki", "model": "V-Strom 650 XT", "body_style": "MC" } |
vehicles.2 | String | { "year": "2012", "make": "Honda", "model": "Accord", "body_style": "4D" } |
Using Both FlattenObjects and FlattenArrays
If FlattenObjects is set along with FlattenArrays (set to 1 for brevity), the vehicles field will be represented as follows:
Column Name | Data Type | Example Value |
vehicles | String | [ { "year": "2015", "make": "Dodge", ... }, { "year": "2015", "make": "Suzuki", ... }, { "year": "2012", "make": "Honda", ... } ] |
vehicles.0.year | String | 2015 |
vehicles.0.make | String | Dodge |
vehicles.0.model | String | RAM 1500 |
vehicles.0.body_style | String | TK |