JDBC Driver for Spark SQL

Build 21.0.7930

Automatically Caching Data

Automatically caching data is useful when you do not want to rebuild the cache for each query. When you query data for the first time, the driver automatically initializes and builds a cache in the background. When AutoCache = true, the driver uses the cache for subsequent query executions, resulting in faster response times.

Configuring Automatic Caching

Caching the Customers Table

The following example caches the Customers table in the file specified by the CacheLocation property of the connection string.

String connectionString = "jdbc:sparksql:Cache Location=C:\\cache.db;" +
                          "AutoCache=true;" +
                          "Server=127.0.0.1;";
Connection connection = DriverManager.getConnection(connectionString);
Statement stat = connection.createStatement();
boolean ret = stat.execute("SELECT City, CompanyName FROM Customers WHERE Country = 'US'");
ResultSet rs=stat.getResultSet();
while(rs.next()){
  System.out.println("Read and cached the row with _id "+rs.getString("_id"));
}
connection.close();

Common Use Case

A common use for automatically caching data is to improve driver performance when making repeated requests to a live data source, such as building a report or creating a visualization. With auto caching enabled, repeated requests to the same data may be executed in a short period of time, but within an allowable tolerance (CacheTolerance) of what is considered "live" data.

Copyright (c) 2021 CData Software, Inc. - All rights reserved.
Build 21.0.7930