Customizing Amazon Athena Behavior
Follow the steps below to configure the component defaults for querying Amazon Athena.
Cleaning Query Results
Amazon Athena stores the results of every query you execute in CSV files in S3StagingDirectory; these can quickly rack up a lot of space in Amazon S3. You can use CleanQueryResults, enabled by default, to clean these files for every query executed.
Note that this behavior will add a minor performance hit when you disconnect the last connection in a process.
Using Athena's Query Caching
You configure QueryCachingLevel to modify the usage of the query results stored in S3StagingDirectory; note that you have to keep the connection open to benefit from this feature. This is especially helpful when executing a certain query multiple times. This means Amazon Athena will not scan the same data again and simply use the results from the previous execution. These results are cleaned in the amount of seconds specified in QueryTolerance.
Note that failing to properly disconnect the connection when QueryCachingLevel is set to Cloud may lead to a large amount of saved queries in Athena. For most use cases setting QueryCachingLevel to Local should be enough.
Amazon Athena supports a set of queries that are not specified in the regular SQL-92 standard; to execute these queries simply set QueryPassthrough to true. This will pass the query directly to Amazon Athena without parsing it internally.
Encrypting Query Results to S3
Set EncryptionKey and EncryptionType if you would like to encrypt the result set stored in S3 after query execution.
To encrypt results stored in S3, follow the steps below:
- Navigate to Amazon Athena Console.
- Click Settings.
- Enable the Encrypt Query Results option.
- Select the Encryption type and set EncryptionType when you connect.
- If you selected SSE-KMS or CSE-KMS, set both EncryptionType and EncryptionKey: select an encryption key in the menu or click Create KMS Key.
Fine Tuning Performance
You can use the PageSize property to optimize use of your provisioned throughput, based on the size of your items and Amazon Athena's 1000MB page size. Set this property to the number of items to return.
Generally, a smaller page size reduces spikes in throughput that cause throttling. A smaller page size also inserts pauses between requests. This interval evens out the distribution of requests and allows more requests to be successful by avoiding throttling.
Connecting Through a Firewall or Proxy
To connect through the Windows system proxy, you do not need to set any additional connection properties. To connect to other proxies, set ProxyAutoDetect to false and in addition set the following.
To authenticate to an HTTP proxy, set ProxyAuthScheme, ProxyUser, and ProxyPassword, in addition to ProxyServer and ProxyPort.
Set the following properties: Set FirewallType, FirewallServer, and FirewallPort. To tunnel the connection, set FirewallType to TUNNEL. To authenticate specify FirewallUser and FirewallPassword. To authenticate to a SOCKS proxy, additionally set FirewallType to SOCKS5.
Troubleshooting the Connection
To show component activity from query execution to HTTP calls, use Logfile and Verbosity. The examples of common connection errors below show how to use these properties to get more context. Contact the support team for help tracing the source of an error or circumventing a performance issue.
- Authentication errors: Typically, recording a Logfile at Verbosity 4 is necessary to get full details on an authentication error.
- Queries time out: A server that takes too long to respond will exceed the component's client-side timeout. Often, setting the Timeout property to a higher value will avoid a connection error. Another option is to disable the timeout by setting the property to 0. Setting Verbosity to 2 will show where the time is being spent.