CData Python Connector for XML

Build 23.0.8839

Automatic Schema Discovery

By default, the connector automatically infers a relational schema by inspecting a series of XML objects in an array. This section describes the connection properties available to configure these dynamic schemas.

Discovering Tables

This section shows how to fine-tune the discovered schemas by fine-tuning the row scan. The rows detected depend on the RowScanDepth, DataModel, and XPath properties.

  • RowScanDepth: This setting determines the number of object arrays to scan to find the rows and discover the columns. The process follows nested objects, counting 1 object array as 1 row.
  • DataModel: Select how you want to represent the object arrays as tables. This setting also affects the objects that are scanned: the Relational and FlattenedDocuments settings will find all object arrays (including nested ones) in RowScanDepth items. The Document setting will only find the first object array.

  • XPath: Set this to explicitly specify the object arrays you want to expose. Note that if DataModel is set to Document, you can only set one, top-level XPath. See Parsing Hierarchical Data for examples of accessing a sample document with different XPath values.

Discovering Columns

The columns identified during the discovery process depend on the FlattenArrays and FlattenObjects properties. If FlattenObjects is set (this is the default), nested objects will be flattened into a series of columns. For example, consider the following document:

<company>
  <id>12</id>
  <name>Lohia Manufacturers Inc.</name>
  <address>
    <street>Main Street</street>
    <city>Chapel Hill</city>
    <state>NC</state>
  </address>
  <office>Chapel Hill</office>
  <office>London</office>
  <office>New York</office>
  <annual_revenue>35,600,000</annual_revenue>
</company> 
With the default object flattening, this document will be represented by the following columns:

Column NameData TypeExample Value
idInteger12
nameStringLohia Manufacturers Inc.
address.streetStringMain Street
address.cityStringChapel Hill
address.stateStringNC
officeString<office>Chapel Hill</office><office>London</office><office>New York</office>
annual_revenueDouble35,600,000

If FlattenObjects is not set, then the address.street, address.city, and address.state columns will not be broken apart. The address column of type string will instead represent the entire object. Its value would be the following:

<address><street>Main Street</street><city>Chapel Hill</city><state>NC</state></address>

The FlattenArrays property can be used to flatten array values into columns of their own. This is only recommended for arrays that are expected to be short, for example the offices array below:

<office>London</office>
<office>Los Angeles</office>
The FlattenArrays property can be set to 2 to represent the array above as follows:

Column NameData TypeExample Value
office.0StringLondon
office.1StringLos Angeles

It is best to leave other unbounded arrays as they are and piece out the data for them as needed using XML Functions.

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 23.0.8839