Modeling XML Data
In this section we will show how to control the various schemes that the connector offers to bridge the gap with relational SQL and XML services. The CData Python Connector for XML provides a managed way for you to use the two prevailing techniques for dealing with nested XML data:
- Parsing the data structure and building a relational model based on the existing hierarchy.
- Drilling down into the nested elements using horizontal and vertical flattening.
Parsing Hierarchical Data
By default, the connector automatically detects the rows in a document, so that you do not need to know the structure of the XML to query it with SQL. Set the DataModel property to choose a basic configuration of how the connector models the rows into tables.
Flattening Objects and Arrays into Rows
To flatten data, you only need to be familiar with two data structures:
- Object: Any parent element that does not repeat at the same height.
- Array: Any element that repeats at the same height.
In the following example from the people collection, maintenance is an object array, since each maintenance node has child elements.
<maintenance>
<date>07-17-2017</date>
<desc>oil change</desc>
</maintenance>
<maintenance>
<date>01-03-2018</date>
<desc>new tires</desc>
</maintenance>
Configuring Automatic Schema Discovery
The connector discovers columns and data types by scanning the RowScanDepth count of XML objects in XML arrays. Set the FlattenObjects and FlattenArrays properties to configure how nested data is flattened into columns; see Automatic Schema Discovery for examples.
Executing SQL to XML
Any relation you can access through flattening you can also access with an ad-hoc SQL query. The connector enables you to query nested data with the following capabilities:
- Vertical Flattening: Access nested object arrays as separate tables.
- Free-Form Queries: Query any nested structure without flattening the data.
- XML Functions: Manipulate the data returned to perform client-side aggregation and transformations.
Customizing Schemas
Customizing Schemas enables you to project your chosen relational structure on top of an XML document. This allows you to choose the names of columns, their data types, and the locations of their values in the document.
System Catalog
The System Tables reflect the schemas you configured, custom schemas or dynamically discovered. The Stored Procedures surface additional functionality in the connector's data processing operations that cannot be modeled as SELECT, INSERT, UPDATE, or DELETE. You can find the reported stored procedures defined in .rsb files in the folder specified by Location -- if Location is not specified, the db subfolder of the installation directory.