BizTalk Adapter for Amazon DynamoDB

Build 20.0.7654

Querying The Data

DynamoDB has two distinct operations to read data from tables: query and scan. The CData BizTalk Adapter for Amazon DynamoDB will automatically attempt to figure out which type to use based on the WHERE clause of the SQL statement.

Query vs. Scan

In DynamoDB, a query is used when some of the data can be filtered before results are returned. This is done by the use of partition keys and sort keys that are defined on the table to perform the filter. It is typically much faster than a scan.

A scan is performed when anything other than a partition key or a sort key is used to filter the data. In the case of the scan, data is only filtered after it is returned. This is much slower since it will cause DynamoDB to return everything and simply exclude nonmatching results from the response. In the worst cases it will return full pages of data that are simply empty due to no matching results.

Secondary Indexes

Secondary indexes in DynamoDB behave like tables in themselves that are only accessible off of a given table. A secondary index behaves like a view either with predefined columns or simply with all columns being accessible. The main difference is that different partition and sort keys may be selected for them. There is no restriction on which keys to use - they may be the same as the partition and sort keys for the table itself, the same as ones from a different secondary index, or completely unique to that index.

The adapter by default will attempt to determine if a secondary index should be used to perform a query operation. This is done by analyzing the WHERE clause of the SQL statement. If it is determined that a query should be used and that a specific index should be used for that query, then the secondary index will automatically be selected. This behavior can be disabled by setting AutoDetectIndex to false.

A secondary index can be explicitly specified by using the SecondaryIndex pseudo column. This will override AutoDetectIndex and will be used for both query and scan operations. For example:

SELECT * FROM Table WHERE SecondaryIndex = 'ColumnName' AND ...

Query Examples

A query operation will be submitted to DynamoDB if the WHERE clause includes a single PartitionKey with an '=' operator and optionally up to one SortKey using any DynamoDB supported operator. For instance:

SELECT * FROM Table WHERE PartitionKey = 'x'

SELECT * FROM Table WHERE PartitionKey = 'x' AND SortKey > 'y' AND SortKey < 'z'

Additionally, if AutoDetectIndex is set to true, the adapter will attempt to automatically determine an index based on the specific keys used. For instance:

SELECT * FROM Table WHERE SecondaryIndexPartitionKey = 'x'

The preceding query would result in a query against the SecondaryIndex, provided that it was the only one that used the specified PartitionKey.

A secondary index may be explicitly specified via the SecondaryIndex pseudo column. This will cause the specified index to be used regardless of if a Query or Scan is submitted:

SELECT * FROM Table WHERE SecondaryIndex = 'IndexName'

Parallel Scans

For better performance when scanning results, parallel scans may be used. Parallel scans are a way to use multiple threads to execute the same SELECT statement. Each thread is responsible for retrieving an equal share of the data. Because threads retrieve data independently, it can exponentially improve performance. However, performance gains using this method will be limited to CPU performance, number of cores, bandwidth constraints, and read units for the table. A machine with more cores and more bandwidth may benefit from using additional threads whereas a machine with fewer cores and less bandwith available will see no benefit.

Amazon DynamoDB offers no equivalent for parallel scans when querying results. Queries can only be executed in a single thread.

By default, parallel scans are enabled. They can be disabled or changed by updating the ThreadCount connection property.

Please be aware that using parallel scans will consume your allocated read units for a given table at a much faster rate than using a single thread. If you are getting throttle exceptions, it is recommended to decrease the ThreadCount or set it to 1.

Copyright (c) 2020 CData Software, Inc. - All rights reserved.
Build 20.0.7654