Automatic Schema Discovery

The provider automatically infers a relational schema by retrieving the mapping of the HarperDB type. The columns and data types are generated from the retrieved data based on the value of the RowScanDepth property.

Detecting Data Types

HarperDB is schemaless and allows unstructured data. To map this data into a relational table, the provider will detect the data types by scanning the data for each column. The number of rows scanned is configurable via RowScanDepth.

Detecting Columns

Since HarperDB is schemaless, columns may contain JSON objects or arrays. The provider enables special handling of these JSON structures and enables you to flatten them. Therefore, columns identified during the discovery process depend on the FlattenArrays and FlattenObjects properties.

Example Data Set

To provide an example of how these options work, consider the following sample data:

[
  {
    "name": "John Smith",
    "address": {
      "street": "Main Street",
      "city": "Chapel Hill",
      "state": "NC"
    },
    "insured_ages": [ 17, 43, 45 ], 
    "vehicles": [
      {
        "year": 2015,
        "make": "Dodge",
        "model": "RAM 1500",
        "body_style": "TK"
      },
      {
        "year": 2015,
        "make": "Suzuki",
        "model": "V-Strom 650 XT",
        "body_style": "MC"
      },
      {
        "year": 2012,
        "make": "Honda",
        "model": "Accord",
        "body_style": "4D"
      }
    ]
  }
]

Using FlattenObjects

If FlattenObjects is set, all nested objects will be flattened into a series of columns. The above example will be represented by the following columns:


Column Name	Data Type	Example Value
name	String	John Smith
address.street	String	Main Street
address.city	String	Chapel Hill
address.state	String	NC
insured_ages	String	[ 17, 43, 45 ]
vehicles	String	[ { "year": "2015", "make": "Dodge", ... }, { "year": "2015", "make": "Suzuki", ... }, { "year": "2012", "make": "Honda", ... } ]

If FlattenObjects is not set, then the address.street, address.city, and address.state columns will not be broken apart. The address column of type string will instead represent the entire object. Its value would be the following:

{street: "Main Street", city: "Chapel Hill", state: "NC"}

See JSON Functions for more details on working with JSON aggregates.

Using FlattenArrays

The FlattenArrays property can be used to flatten array values into columns of their own. This is only recommended for arrays that are expected to be short. It is best to leave unbounded arrays as they are and piece out the data for them as needed using JSON Functions.

Note: Only the top-most array will be flattened. Any subarrays will be represented as the entire array.

The FlattenArrays property can be set to 3 to represent the arrays in the example above as follows (this example is with FlattenObjects not set):


Column Name	Data Type	Example Value
insured_ages	String	[ 17, 43, 45 ]
insured_ages.0	Integer	17
insured_ages.1	Integer	43
insured_ages.2	Integer	45
vehicles	String	[ { "year": "2015", "make": "Dodge", ... }, { "year": "2015", "make": "Suzuki", ... }, { "year": "2012", "make": "Honda", ... } ]
vehicles.0	String	{ "year": "2015", "make": "Dodge", "model": "RAM 1500", "body_style": "TK" }
vehicles.1	String	{ "year": "2015", "make": "Suzuki", "model": "V-Strom 650 XT", "body_style": "MC" }
vehicles.2	String	{ "year": "2012", "make": "Honda", "model": "Accord", "body_style": "4D" }

Using Both FlattenObjects and FlattenArrays

If FlattenObjects is set along with FlattenArrays (set to 1 for brevity), the vehicles field will be represented as follows:


Column Name	Data Type	Example Value
vehicles	String	[ { "year": "2015", "make": "Dodge", ... }, { "year": "2015", "make": "Suzuki", ... }, { "year": "2012", "make": "Honda", ... } ]
vehicles.0.year	String	2015
vehicles.0.make	String	Dodge
vehicles.0.model	String	RAM 1500
vehicles.0.body_style	String	TK