Query Processing

CData has a client-side SQL engine built into the provider library. This enables support for the full capabilities that SQL-92 offers, including filters, aggregations, functions, etc.

For sources that do not support SQL-92, the provider offloads as much of SQL statement processing as possible to Azure Data Lake Storage and then processes the rest of the query in memory (client-side). This results in optimal performance.

For data sources with limited query capabilities, the provider handles transformations of the SQL query to make it simpler for the provider. The goal is to make smart decisions based on the query capabilities of the data source to push down as much of the computation as possible. The Azure Data Lake Storage Query Evaluation component examines SQL queries and returns information indicating what parts of the query the provider is not capable of executing natively.

The Azure Data Lake Storage Query Slicer component is used in more specific cases to separate a single query into multiple independent queries. The client-side Query Engine makes decisions about simplifying queries, breaking queries into multiple queries, and pushing down or computing aggregations on the client-side while minimizing the size of the result set.

There's a significant trade-off in evaluating queries, even partially, client-side. There are always queries that are impossible to execute efficiently in this model, and some can be particularly expensive to compute in this manner. CData always pushes down as much of the query as is feasible for the data source to generate the most efficient query possible and provide the most flexible query capabilities.

More Information

For a full discussion of how CData handles query processing, see CData Architecture: Query Execution.

CData Python Connector for Azure Data Lake Storage

Query Processing

Query Processing

More Information