Query Processing

CData has a client-side SQL engine built into the driver library. This enables support for the full capabilities that SQL-92 offers, including filters, aggregations, functions, etc.

For sources that do not support SQL-92, the driver offloads as much of SQL statement processing as possible to Spark SQL and then processes the rest of the query in memory (client-side). This results in optimal performance.

For data sources with limited query capabilities, the driver handles transformations of the SQL query to make it simpler for the driver. The goal is to make smart decisions based on the query capabilities of the data source to push down as much of the computation as possible. The Spark SQL Query Evaluation component examines SQL queries and returns information indicating what parts of the query the driver is not capable of executing natively.

The Spark SQL Query Slicer component is used in more specific cases to separate a single query into multiple independent queries. The client-side Query Engine makes decisions about simplifying queries, breaking queries into multiple queries, and pushing down or computing aggregations on the client-side while minimizing the size of the result set.

There's a significant trade-off in evaluating queries, even partially, client-side. There are always queries that are impossible to execute efficiently in this model, and some can be particularly expensive to compute in this manner. CData always pushes down as much of the query as is feasible for the data source to generate the most efficient query possible and provide the most flexible query capabilities.

More Information

For a full discussion of how CData handles query processing, see CData Architecture: Query Execution.

JDBC Driver for Spark SQL

Query Processing

Query Processing

More Information