Explicitly Caching Data
With explicit caching (AutoCache = false), you decide exactly what data is cached and when to query the cache instead of the live data. Explicit caching gives you full control over the cache contents by using CACHE Statements. This section describes some strategies to use the caching features offered by the driver.
Creating the Cache
To load data in the cache, issue the following statement.
CACHE SELECT * FROM tableName WHERE ...
Once the statement is issued, any matching data in tableName is loaded into the corresponding table.
Updating the Cache
This section describes two ways to update the cache.
Updating with the SELECT Statement
The following example shows a statement that can update modified rows and add missing rows in the cached table. However, this statement does not delete extra rows that are already in the cache. This statement only merges the new rows or updates the existing rows.
String cmd = "CACHE SELECT * FROM Files WHERE FileId = '119116'", connection"; stat.execute(cmd); connection.close();
Updating with the TRUNCATE Statement
The following example shows a statement that can update modified rows and add missing rows in the cached table. This statement can also delete rows in the cache table that are not present in the live data source.
String cmd = "CACHE WITH TRUNCATE SELECT * FROM Files WHERE FileId = '119116'"; stat.execute(cmd); connection.close();
Query the Data in Online or Offline Mode
This section describes how to query the data in online or offline mode.
Online: Select Cached Tables
You can use the tableName#CACHE syntax to explicitly execute queries to the cache while still online, as shown in the following example.
SELECT * FROM Files#CACHE
Offline: Select Cached Tables
With Offline = true, SELECT statements always execute against the local cache database, regardless of whether you explicitly specify the cached table or not. Modification of the cache is disabled in Offline mode to prevent accidentally updating only the cached data. Executing a DELETE/UPDATE/INSERT statement while in Offline mode results in an exception.
The following example selects from the local cache but not the live data source because Offline = true.
Connection connection = DriverManager.getConnection("jdbc:hdfs:Host=sandbox-hdp.hortonworks.com;Port=50070;Path=/user/root;Offline=true;Cache Location=C:\\cache.db;"); Statement stat = connection.createStatement(); String query = "SELECT * FROM Files WHERE FileId='119116' ORDER BY ChildrenNum ASC"; stat.execute(query); connection.close();
Delete Data from the Cache
You can delete data from the cache by building a direct connection to the database. Note that the driver does not support manually deleting data from the cache.
Common Use Case
A common use for caching is to have an application always query the cached data and only update the cache at set intervals, such as once every day or every two hours. There are two ways in which this can be implemented:
- AutoCache = false and Offline = false. All queries issued by the application explicitly reference the tableName#CACHE table. When the cache needs to be updated, the application executes a tableName#CACHE ... statement to bring the cached data up to date.
- Offline = true. Caching is transparent to the application. All queries are executed against the table as normal, so most application code does not need to be aware that caching is done. To update the cached data, simply create a separate connection with Offline = false and execute a tableName#CACHE ... statement.