CData Cloud offers access to Apache Hive across several standard services and protocols, in a cloud-hosted solution. Any application that can connect to a MySQL or SQL Server database can connect to Apache Hive through CData Cloud.
CData Cloud allows you to standardize and configure connections to Apache Hive as though it were any other OData endpoint, or standard SQL Server/MySQL database.
This page provides a guide to Establishing a Connection to Apache Hive in CData Cloud, as well as information on the available resources, and a reference to the available connection properties.
Establishing a Connection shows how to authenticate to Apache Hive and configure any necessary connection properties to create a database in CData Cloud
Accessing data from Apache Hive through the available standard services and CData Cloud administration is documented in further details in the CData Cloud Documentation.
Connect to Apache Hive by selecting the corresponding icon in the Database tab. Required properties are listed under Settings. The Advanced tab lists connection properties that are not typically required.
Do the following:
Do the following:
gcloud compute instances list
Note the external IP of the relevant machine.
To authenticate to Apache Hive with PLAIN SASL, set the hive.server2.authentication property in your Hive configuration file (hive-site.xml) to None and set the following Cloud connection properties:
To authenticate to Apache Hive with Kerberos, set AuthScheme to KERBEROS.
Authenticating to Apache Hive via Kerberos requires you to define authentication properties and to choose how Kerberos should retrieve authentication tickets.
The Cloud provides three ways to retrieve the required Kerberos ticket, depending on whether or not the KRB5CCNAME and/or KerberosKeytabFile variables exist in your environment.
MIT Kerberos Credential Cache File
This option enables you to use the MIT Kerberos Ticket Manager or kinit command to get tickets. With this option there is no need to set the User or Password connection properties.
This option requires that KRB5CCNAME has been created in your system.
To enable ticket retrieval via MIT Cerberos Credential Cache Files:
If the ticket is successfully obtained, the ticket information appears in Kerberos Ticket Manager and is stored in the credential cache file.
The Cloud uses the cache file to obtain the Kerberos ticket to connect to Apache Hive.
Note: If you would prefer not to edit KRB5CCNAME, you can use the KerberosTicketCache property to set the file path manually. After this is set, the Cloud uses the specified cache file to obtain the Kerberos ticket to connect to Apache Hive.
Keytab File
If your environment lacks the KRB5CCNAME environment variable, you can retrieve a Kerberos ticket using a Keytab File.
To use this method, set the User property to the desired username, and set the KerberosKeytabFile property to a file path pointing to the keytab file associated with the user.
User and Password
If your environment lacks the KRB5CCNAME environment variable and the KerberosKeytabFile property has not been set, you can retrieve a ticket using a user and password combination.
To use this method, set the User and Password properties to the user/password combination that you use to authenticate with Apache Hive.
To enable this kind of cross-realm authentication, set the KerberosRealm and KerberosKDC properties to the values required for user authentication. Also, set the KerberosServiceRealm and KerberosServiceKDC properties to the values required to obtain the service ticket.
Apache Hive supports multiple ways to perform similar operations. The options below allow you to configure which HiveQL statement is issued to perform an operation.
When set to true, the Cloud will perform INSERT queries using the INSERT INTO SELECT syntax: INSERT INTO TABLE [table] SELECT T.* FROM (....) AS T.
If set to false (default), the INSERT INTO VALUES syntax will be used: INSERT INTO TABLE [table] VALUES (....).
The Cloud will automatically determine which syntax is supported by your Hive server but setting this option to true will force the INSERT INTO SELECT syntax to be used.
By default, the Cloud attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store.
To specify another certificate, see the SSLServerCert property for the available formats to do so.
To connect through the Windows system proxy, you do not need to set any additional connection properties. To connect to other proxies, set ProxyAutoDetect to false.
In addition, to authenticate to an HTTP proxy, set ProxyAuthScheme, ProxyUser, and ProxyPassword, in addition to ProxyServer and ProxyPort.
Set the following properties:
The Cloud models Apache Hive instances as relational databases. Hive versions 0.11.0 and above are supported. The Cloud leverages the HiveServer2 Thrift API, to enable bidirectional access to Apache Hive data through SQL.
You can query the system tables described in this section to access schema information, information on data source functionality, and batch operation statistics.
The following tables return database metadata for Apache Hive:
The following tables return information about how to connect to and query the data source:
The following table returns query statistics for data modification queries, including batch operations::
Lists the available databases.
The following query retrieves all databases determined by the connection string:
SELECT * FROM sys_catalogs
Name | Type | Description |
CatalogName | String | The database name. |
Lists the available schemas.
The following query retrieves all available schemas:
SELECT * FROM sys_schemas
Name | Type | Description |
CatalogName | String | The database name. |
SchemaName | String | The schema name. |
Lists the available tables.
The following query retrieves the available tables and views:
SELECT * FROM sys_tables
Name | Type | Description |
CatalogName | String | The database containing the table or view. |
SchemaName | String | The schema containing the table or view. |
TableName | String | The name of the table or view. |
TableType | String | The table type (table or view). |
Description | String | A description of the table or view. |
IsUpdateable | Boolean | Whether the table can be updated. |
Describes the columns of the available tables and views.
The following query returns the columns and data types for the [CData].[Default].Customers table:
SELECT ColumnName, DataTypeName FROM sys_tablecolumns WHERE TableName='Customers' AND CatalogName='CData' AND SchemaName='Default'
Name | Type | Description |
CatalogName | String | The name of the database containing the table or view. |
SchemaName | String | The schema containing the table or view. |
TableName | String | The name of the table or view containing the column. |
ColumnName | String | The column name. |
DataTypeName | String | The data type name. |
DataType | Int32 | An integer indicating the data type. This value is determined at run time based on the environment. |
Length | Int32 | The storage size of the column. |
DisplaySize | Int32 | The designated column's normal maximum width in characters. |
NumericPrecision | Int32 | The maximum number of digits in numeric data. The column length in characters for character and date-time data. |
NumericScale | Int32 | The column scale or number of digits to the right of the decimal point. |
IsNullable | Boolean | Whether the column can contain null. |
Description | String | A brief description of the column. |
Ordinal | Int32 | The sequence number of the column. |
IsAutoIncrement | String | Whether the column value is assigned in fixed increments. |
IsGeneratedColumn | String | Whether the column is generated. |
IsHidden | Boolean | Whether the column is hidden. |
IsArray | Boolean | Whether the column is an array. |
IsReadOnly | Boolean | Whether the column is read-only. |
IsKey | Boolean | Indicates whether a field returned from sys_tablecolumns is the primary key of the table. |
Lists the available stored procedures.
The following query retrieves the available stored procedures:
SELECT * FROM sys_procedures
Name | Type | Description |
CatalogName | String | The database containing the stored procedure. |
SchemaName | String | The schema containing the stored procedure. |
ProcedureName | String | The name of the stored procedure. |
Description | String | A description of the stored procedure. |
ProcedureType | String | The type of the procedure, such as PROCEDURE or FUNCTION. |
Describes stored procedure parameters.
The following query returns information about all of the input parameters for the SearchSuppliers stored procedure:
SELECT * FROM sys_procedureparameters WHERE ProcedureName='SearchSuppliers' AND Direction=1 OR Direction=2
Name | Type | Description |
CatalogName | String | The name of the database containing the stored procedure. |
SchemaName | String | The name of the schema containing the stored procedure. |
ProcedureName | String | The name of the stored procedure containing the parameter. |
ColumnName | String | The name of the stored procedure parameter. |
Direction | Int32 | An integer corresponding to the type of the parameter: input (1), input/output (2), or output(4). input/output type parameters can be both input and output parameters. |
DataTypeName | String | The name of the data type. |
DataType | Int32 | An integer indicating the data type. This value is determined at run time based on the environment. |
Length | Int32 | The number of characters allowed for character data. The number of digits allowed for numeric data. |
NumericPrecision | Int32 | The maximum precision for numeric data. The column length in characters for character and date-time data. |
NumericScale | Int32 | The number of digits to the right of the decimal point in numeric data. |
IsNullable | Boolean | Whether the parameter can contain null. |
IsRequired | Boolean | Whether the parameter is required for execution of the procedure. |
IsArray | Boolean | Whether the parameter is an array. |
Description | String | The description of the parameter. |
Ordinal | Int32 | The index of the parameter. |
Describes the primary and foreign keys.
The following query retrieves the primary key for the [CData].[Default].Customers table:
SELECT * FROM sys_keycolumns WHERE IsKey='True' AND TableName='Customers' AND CatalogName='CData' AND SchemaName='Default'
Name | Type | Description |
CatalogName | String | The name of the database containing the key. |
SchemaName | String | The name of the schema containing the key. |
TableName | String | The name of the table containing the key. |
ColumnName | String | The name of the key column. |
IsKey | Boolean | Whether the column is a primary key in the table referenced in the TableName field. |
IsForeignKey | Boolean | Whether the column is a foreign key referenced in the TableName field. |
PrimaryKeyName | String | The name of the primary key. |
ForeignKeyName | String | The name of the foreign key. |
ReferencedCatalogName | String | The database containing the primary key. |
ReferencedSchemaName | String | The schema containing the primary key. |
ReferencedTableName | String | The table containing the primary key. |
ReferencedColumnName | String | The column name of the primary key. |
Describes the foreign keys.
The following query retrieves all foreign keys which refer to other tables:
SELECT * FROM sys_foreignkeys WHERE ForeignKeyType = 'FOREIGNKEY_TYPE_IMPORT'
Name | Type | Description |
CatalogName | String | The name of the database containing the key. |
SchemaName | String | The name of the schema containing the key. |
TableName | String | The name of the table containing the key. |
ColumnName | String | The name of the key column. |
PrimaryKeyName | String | The name of the primary key. |
ForeignKeyName | String | The name of the foreign key. |
ReferencedCatalogName | String | The database containing the primary key. |
ReferencedSchemaName | String | The schema containing the primary key. |
ReferencedTableName | String | The table containing the primary key. |
ReferencedColumnName | String | The column name of the primary key. |
ForeignKeyType | String | Designates whether the foreign key is an import (points to other tables) or export (referenced from other tables) key. |
Describes the primary keys.
The following query retrieves the primary keys from all tables and views:
SELECT * FROM sys_primarykeys
Name | Type | Description |
CatalogName | String | The name of the database containing the key. |
SchemaName | String | The name of the schema containing the key. |
TableName | String | The name of the table containing the key. |
ColumnName | String | The name of the key column. |
KeySeq | String | The sequence number of the primary key. |
KeyName | String | The name of the primary key. |
Describes the available indexes. By filtering on indexes, you can write more selective queries with faster query response times.
The following query retrieves all indexes that are not primary keys:
SELECT * FROM sys_indexes WHERE IsPrimary='false'
Name | Type | Description |
CatalogName | String | The name of the database containing the index. |
SchemaName | String | The name of the schema containing the index. |
TableName | String | The name of the table containing the index. |
IndexName | String | The index name. |
ColumnName | String | The name of the column associated with the index. |
IsUnique | Boolean | True if the index is unique. False otherwise. |
IsPrimary | Boolean | True if the index is a primary key. False otherwise. |
Type | Int16 | An integer value corresponding to the index type: statistic (0), clustered (1), hashed (2), or other (3). |
SortOrder | String | The sort order: A for ascending or D for descending. |
OrdinalPosition | Int16 | The sequence number of the column in the index. |
Returns information on the available connection properties and those set in the connection string.
When querying this table, the config connection string should be used:
jdbc:cdata:apachehive:config:
This connection string enables you to query this table without a valid connection.
The following query retrieves all connection properties that have been set in the connection string or set through a default value:
SELECT * FROM sys_connection_props WHERE Value <> ''
Name | Type | Description |
Name | String | The name of the connection property. |
ShortDescription | String | A brief description. |
Type | String | The data type of the connection property. |
Default | String | The default value if one is not explicitly set. |
Values | String | A comma-separated list of possible values. A validation error is thrown if another value is specified. |
Value | String | The value you set or a preconfigured default. |
Required | Boolean | Whether the property is required to connect. |
Category | String | The category of the connection property. |
IsSessionProperty | String | Whether the property is a session property, used to save information about the current connection. |
Sensitivity | String | The sensitivity level of the property. This informs whether the property is obfuscated in logging and authentication forms. |
PropertyName | String | A camel-cased truncated form of the connection property name. |
Ordinal | Int32 | The index of the parameter. |
CatOrdinal | Int32 | The index of the parameter category. |
Hierarchy | String | Shows dependent properties associated that need to be set alongside this one. |
Visible | Boolean | Informs whether the property is visible in the connection UI. |
ETC | String | Various miscellaneous information about the property. |
Describes the SELECT query processing that the Cloud can offload to the data source.
See SQL Compliance for SQL syntax details.
Below is an example data set of SQL capabilities. Some aspects of SELECT functionality are returned in a comma-separated list if supported; otherwise, the column contains NO.
Name | Description | Possible Values |
AGGREGATE_FUNCTIONS | Supported aggregation functions. | AVG, COUNT, MAX, MIN, SUM, DISTINCT |
COUNT | Whether COUNT function is supported. | YES, NO |
IDENTIFIER_QUOTE_OPEN_CHAR | The opening character used to escape an identifier. | [ |
IDENTIFIER_QUOTE_CLOSE_CHAR | The closing character used to escape an identifier. | ] |
SUPPORTED_OPERATORS | A list of supported SQL operators. | =, >, <, >=, <=, <>, !=, LIKE, NOT LIKE, IN, NOT IN, IS NULL, IS NOT NULL, AND, OR |
GROUP_BY | Whether GROUP BY is supported, and, if so, the degree of support. | NO, NO_RELATION, EQUALS_SELECT, SQL_GB_COLLATE |
OJ_CAPABILITIES | The supported varieties of outer joins supported. | NO, LEFT, RIGHT, FULL, INNER, NOT_ORDERED, ALL_COMPARISON_OPS |
OUTER_JOINS | Whether outer joins are supported. | YES, NO |
SUBQUERIES | Whether subqueries are supported, and, if so, the degree of support. | NO, COMPARISON, EXISTS, IN, CORRELATED_SUBQUERIES, QUANTIFIED |
STRING_FUNCTIONS | Supported string functions. | LENGTH, CHAR, LOCATE, REPLACE, SUBSTRING, RTRIM, LTRIM, RIGHT, LEFT, UCASE, SPACE, SOUNDEX, LCASE, CONCAT, ASCII, REPEAT, OCTET, BIT, POSITION, INSERT, TRIM, UPPER, REGEXP, LOWER, DIFFERENCE, CHARACTER, SUBSTR, STR, REVERSE, PLAN, UUIDTOSTR, TRANSLATE, TRAILING, TO, STUFF, STRTOUUID, STRING, SPLIT, SORTKEY, SIMILAR, REPLICATE, PATINDEX, LPAD, LEN, LEADING, KEY, INSTR, INSERTSTR, HTML, GRAPHICAL, CONVERT, COLLATION, CHARINDEX, BYTE |
NUMERIC_FUNCTIONS | Supported numeric functions. | ABS, ACOS, ASIN, ATAN, ATAN2, CEILING, COS, COT, EXP, FLOOR, LOG, MOD, SIGN, SIN, SQRT, TAN, PI, RAND, DEGREES, LOG10, POWER, RADIANS, ROUND, TRUNCATE |
TIMEDATE_FUNCTIONS | Supported date/time functions. | NOW, CURDATE, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, MONTH, QUARTER, WEEK, YEAR, CURTIME, HOUR, MINUTE, SECOND, TIMESTAMPADD, TIMESTAMPDIFF, DAYNAME, MONTHNAME, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, EXTRACT |
REPLICATION_SKIP_TABLES | Indicates tables skipped during replication. | |
REPLICATION_TIMECHECK_COLUMNS | A string array containing a list of columns which will be used to check for (in the given order) to use as a modified column during replication. | |
IDENTIFIER_PATTERN | String value indicating what string is valid for an identifier. | |
SUPPORT_TRANSACTION | Indicates if the provider supports transactions such as commit and rollback. | YES, NO |
DIALECT | Indicates the SQL dialect to use. | |
KEY_PROPERTIES | Indicates the properties which identify the uniform database. | |
SUPPORTS_MULTIPLE_SCHEMAS | Indicates if multiple schemas may exist for the provider. | YES, NO |
SUPPORTS_MULTIPLE_CATALOGS | Indicates if multiple catalogs may exist for the provider. | YES, NO |
DATASYNCVERSION | The CData Data Sync version needed to access this driver. | Standard, Starter, Professional, Enterprise |
DATASYNCCATEGORY | The CData Data Sync category of this driver. | Source, Destination, Cloud Destination |
SUPPORTSENHANCEDSQL | Whether enhanced SQL functionality beyond what is offered by the API is supported. | TRUE, FALSE |
SUPPORTS_BATCH_OPERATIONS | Whether batch operations are supported. | YES, NO |
SQL_CAP | All supported SQL capabilities for this driver. | SELECT, INSERT, DELETE, UPDATE, TRANSACTIONS, ORDERBY, OAUTH, ASSIGNEDID, LIMIT, LIKE, BULKINSERT, COUNT, BULKDELETE, BULKUPDATE, GROUPBY, HAVING, AGGS, OFFSET, REPLICATE, COUNTDISTINCT, JOINS, DROP, CREATE, DISTINCT, INNERJOINS, SUBQUERIES, ALTER, MULTIPLESCHEMAS, GROUPBYNORELATION, OUTERJOINS, UNIONALL, UNION, UPSERT, GETDELETED, CROSSJOINS, GROUPBYCOLLATE, MULTIPLECATS, FULLOUTERJOIN, MERGE, JSONEXTRACT, BULKUPSERT, SUM, SUBQUERIESFULL, MIN, MAX, JOINSFULL, XMLEXTRACT, AVG, MULTISTATEMENTS, FOREIGNKEYS, CASE, LEFTJOINS, COMMAJOINS, WITH, LITERALS, RENAME, NESTEDTABLES, EXECUTE, BATCH, BASIC, INDEX |
PREFERRED_CACHE_OPTIONS | A string value specifies the preferred cacheOptions. | |
ENABLE_EF_ADVANCED_QUERY | Indicates if the driver directly supports advanced queries coming from Entity Framework. If not, queries will be handled client side. | YES, NO |
PSEUDO_COLUMNS | A string array indicating the available pseudo columns. | |
MERGE_ALWAYS | If the value is true, The Merge Mode is forcibly executed in Data Sync. | TRUE, FALSE |
REPLICATION_MIN_DATE_QUERY | A select query to return the replicate start datetime. | |
REPLICATION_MIN_FUNCTION | Allows a provider to specify the formula name to use for executing a server side min. | |
REPLICATION_START_DATE | Allows a provider to specify a replicate startdate. | |
REPLICATION_MAX_DATE_QUERY | A select query to return the replicate end datetime. | |
REPLICATION_MAX_FUNCTION | Allows a provider to specify the formula name to use for executing a server side max. | |
IGNORE_INTERVALS_ON_INITIAL_REPLICATE | A list of tables which will skip dividing the replicate into chunks on the initial replicate. | |
CHECKCACHE_USE_PARENTID | Indicates whether the CheckCache statement should be done against the parent key column. | TRUE, FALSE |
CREATE_SCHEMA_PROCEDURES | Indicates stored procedures that can be used for generating schema files. |
The following query retrieves the operators that can be used in the WHERE clause:
SELECT * FROM sys_sqlinfo WHERE Name = 'SUPPORTED_OPERATORS'
Note that individual tables may have different limitations or requirements on the WHERE clause; refer to the Data Model section for more information.
Name | Type | Description |
NAME | String | A component of SQL syntax, or a capability that can be processed on the server. |
VALUE | String | Detail on the supported SQL or SQL syntax. |
Returns information about attempted modifications.
The following query retrieves the Ids of the modified rows in a batch operation:
SELECT * FROM sys_identity
Name | Type | Description |
Id | String | The database-generated Id returned from a data modification operation. |
Batch | String | An identifier for the batch. 1 for a single operation. |
Operation | String | The result of the operation in the batch: INSERTED, UPDATED, or DELETED. |
Message | String | SUCCESS or an error message if the update in the batch failed. |
The connection string properties are the various options that can be used to establish a connection. This section provides a complete list of the options you can configure in the connection string for this provider. Click the links for further details.
For more information on establishing a connection, see Establishing a Connection.
Property | Description |
AuthScheme | The authentication scheme used. Accepted entries are Plain, LDAP, NoSasl, and Kerberos. |
Server | The host name or IP address of the server hosting HiveServer2. |
Port | The port for the connection to the HiveServer2 instance. |
User | The username used to authenticate with Hive. |
Password | The password used to authenticate with Hive. |
ProtocolVersion | The Protocol Version used to authenticate with Hive. |
TransportMode | The transport mode to use to communicate with the Hive server. Accepted entries are BINARY and HTTP. |
Database | The name of the Hive database to use by default. |
ImpersonationProxyUser | The proxy user of the Hive user impersonation. |
Property | Description |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
Property | Description |
SSHAuthMode | The authentication method used when establishing an SSH Tunnel to the service. |
SSHClientCert | A certificate to be used for authenticating the SSHUser. |
SSHClientCertPassword | The password of the SSHClientCert key if it has one. |
SSHClientCertSubject | The subject of the SSH client certificate. |
SSHClientCertType | The type of SSHClientCert private key. |
SSHServer | The SSH server. |
SSHPort | The SSH port. |
SSHUser | The SSH user. |
SSHPassword | The SSH password. |
SSHServerFingerprint | The SSH server fingerprint. |
UseSSH | Whether to tunnel the Apache Hive connection over SSH. Use SSH. |
Property | Description |
Verbosity | The verbosity level that determines the amount of detail included in the log file. |
Property | Description |
BrowsableSchemas | This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC. |
Property | Description |
AsyncQueryTimeout | The timeout for asynchronous requests issued by the provider to download large result sets. |
EnableXSRF | This option specifies whether to add the X-XSRF-Header filter to HiveServer2 HTTP mode. |
FailoverHosts | This property allows you to specify a list of failover hosts in addition to the one configured in Server and Port . |
HTTPPath | The path component of the URL endpoint when using HTTP TransportMode. |
MaxRows | Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses. |
Pagesize | The maximum number of results to return per page from Apache Hive. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
ServerConfigurations | A list of server configuration variables to override the server defaults. |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
UseDescTableQuery | This option specifies whether the columns will be retrieved using a DESC TABLE query or the GetColumns Thrift API. |
UseShowDatabasesQuery | This option specifies whether the schemas will be retrieved using a SHOW DATABASES query or the GetSchemas Thrift API. |
UseShowTablesQuery | This option specifies whether the tables will be retrieved using a SHOW TABLES query or the GetTables Thrift API. |
UseSSL | Specifies whether to use SSL Encryption when connecting to Hive. |
UseZookeeperDiscovery | Specifies whether to use ZooKeeper Service Discovery. |
ZookeeperNamespace | The namespace configured on ZooKeeper for the Hive Server 2 znodes. |
This section provides a complete list of the Authentication properties you can configure in the connection string for this provider.
Property | Description |
AuthScheme | The authentication scheme used. Accepted entries are Plain, LDAP, NoSasl, and Kerberos. |
Server | The host name or IP address of the server hosting HiveServer2. |
Port | The port for the connection to the HiveServer2 instance. |
User | The username used to authenticate with Hive. |
Password | The password used to authenticate with Hive. |
ProtocolVersion | The Protocol Version used to authenticate with Hive. |
TransportMode | The transport mode to use to communicate with the Hive server. Accepted entries are BINARY and HTTP. |
Database | The name of the Hive database to use by default. |
ImpersonationProxyUser | The proxy user of the Hive user impersonation. |
The authentication scheme used. Accepted entries are Plain, LDAP, NoSasl, and Kerberos.
string
"Plain"
The AuthScheme used to authenticate with Hive.
NoSasl | Used when the hive.server2.authentication property is set to NoSasl. |
Plain (default) | Used when the hive.server2.authentication property is set to None (uses Plain SASL), PAM, or CUSTOM. If User or Password are not set, 'anonymous' will be sent for these fields. |
LDAP | Used when the hive.server2.authentication property is set to LDAP. |
Kerberos | Used when the hive.server2.authentication property is set to Kerberos. |
The host name or IP address of the server hosting HiveServer2.
string
""
If multiple Hive servers are available, FailoverHosts can be set to the additional Hive instances.
If UseZookeeperDiscovery is set to True, Server and Port must be set to the ZooKeeper server.
The port for the connection to the HiveServer2 instance.
string
"10000"
When using BINARY TransportMode, this property should be set to the value in the 'hive.server2.thrift.port' property of the Hive configuration file (hive-site.xml).
When using HTTP TransportMode, this property should be set to the value in the 'hive.server2.thrift.http.port' property of the Hive configuration file (hive-site.xml).
The username used to authenticate with Hive.
string
""
The username used to authenticate with Hive.
The password used to authenticate with Hive.
string
""
The password used to authenticate with Hive.
The Protocol Version used to authenticate with Hive.
string
"1"
The most efficient protocol version will be determined automatically by the CData Cloud upon connecting to Hive. This property allows you to explicitly specify the version to use and overrides the version determined by the CData Cloud.
The transport mode to use to communicate with the Hive server. Accepted entries are BINARY and HTTP.
string
"BINARY"
The transport mode used to communicate with the Hive server.
This property should be set to the 'hive.server2.transport.mode' value specified in your Hive configuration file (hive-site.xml).
The name of the Hive database to use by default.
string
""
When specified, the CData Cloud will issue a 'USE [Database]' command upon connecting to Hive. This will be the database schema used when executing queries that do not have a schema explicitly specified.
To execute queries to other schemas, the schema can be explicitly specified in the statement.
When Database is not set, the 'default' database schema will be used (no 'USE' statement is issued to Hive in this case).
The proxy user of the Hive user impersonation.
string
""
The proxy user of the Hive user impersonation.
This section provides a complete list of the SSL properties you can configure in the connection string for this provider.
Property | Description |
SSLServerCert | The certificate to be accepted from the server when connecting using TLS/SSL. |
The certificate to be accepted from the server when connecting using TLS/SSL.
string
""
If using a TLS/SSL connection, this property can be used to specify the TLS/SSL certificate to be accepted from the server. Any other certificate that is not trusted by the machine is rejected.
This property can take the following forms:
Description | Example |
A full PEM Certificate (example shortened for brevity) | -----BEGIN CERTIFICATE----- MIIChTCCAe4CAQAwDQYJKoZIhv......Qw== -----END CERTIFICATE----- |
A path to a local file containing the certificate | C:\cert.cer |
The public key (example shortened for brevity) | -----BEGIN RSA PUBLIC KEY----- MIGfMA0GCSq......AQAB -----END RSA PUBLIC KEY----- |
The MD5 Thumbprint (hex values can also be either space or colon separated) | ecadbdda5a1529c58a1e9e09828d70e4 |
The SHA1 Thumbprint (hex values can also be either space or colon separated) | 34a929226ae0819f2ec14b4a3d904f801cbb150d |
If not specified, any certificate trusted by the machine is accepted.
Use '*' to signify to accept all certificates. Note that this is not recommended due to security concerns.
This section provides a complete list of the SSH properties you can configure in the connection string for this provider.
Property | Description |
SSHAuthMode | The authentication method used when establishing an SSH Tunnel to the service. |
SSHClientCert | A certificate to be used for authenticating the SSHUser. |
SSHClientCertPassword | The password of the SSHClientCert key if it has one. |
SSHClientCertSubject | The subject of the SSH client certificate. |
SSHClientCertType | The type of SSHClientCert private key. |
SSHServer | The SSH server. |
SSHPort | The SSH port. |
SSHUser | The SSH user. |
SSHPassword | The SSH password. |
SSHServerFingerprint | The SSH server fingerprint. |
UseSSH | Whether to tunnel the Apache Hive connection over SSH. Use SSH. |
The authentication method used when establishing an SSH Tunnel to the service.
string
"Password"
A certificate to be used for authenticating the SSHUser.
string
""
SSHClientCert must contain a valid private key in order to use public key authentication. A public key is optional, if one is not included then the Cloud generates it from the private key. The Cloud sends the public key to the server and the connection is allowed if the user has authorized the public key.
The SSHClientCertType field specifies the type of the key store specified by SSHClientCert. If the store is password protected, specify the password in SSHClientCertPassword.
Some types of key stores are containers which may include multiple keys. By default the Cloud will select the first key in the store, but you can specify a specific key using SSHClientCertSubject.
The password of the SSHClientCert key if it has one.
string
""
This property is only used when authenticating to SFTP servers with SSHAuthMode set to PublicKey and SSHClientCert set to a private key.
The subject of the SSH client certificate.
string
"*"
When loading a certificate the subject is used to locate the certificate in the store.
If an exact match is not found, the store is searched for subjects containing the value of the property.
If a match is still not found, the property is set to an empty string, and no certificate is selected.
The special value "*" picks the first certificate in the certificate store.
The certificate subject is a comma separated list of distinguished name fields and values. For instance "CN=www.server.com, OU=test, C=US, [email protected]". Common fields and their meanings are displayed below.
Field | Meaning |
CN | Common Name. This is commonly a host name like www.server.com. |
O | Organization |
OU | Organizational Unit |
L | Locality |
S | State |
C | Country |
E | Email Address |
If a field value contains a comma it must be quoted.
The type of SSHClientCert private key.
string
"PEMKEY_FILE"
This property can take one of the following values:
Types | Description | Allowed Blob Values |
MACHINE/USER | Blob values are not supported. | |
JKSFILE/JKSBLOB | base64-only | |
PFXFILE/PFXBLOB | A PKCS12-format (.pfx) file. Must contain both a certificate and a private key. | base64-only |
PEMKEY_FILE/PEMKEY_BLOB | A PEM-format file. Must contain an RSA, DSA, or OPENSSH private key. Can optionally contain a certificate matching the private key. | base64 or plain text. Newlines may be replaced with spaces when providing the blob as text. |
PPKFILE/PPKBLOB | A PuTTY-format private key created using the puttygen tool. | base64-only |
XMLFILE/XMLBLOB | An XML key in the format generated by the .NET RSA class: RSA.ToXmlString(true). | base64 or plain text. |
The SSH server.
string
""
The SSH server.
The SSH port.
string
"22"
The SSH port.
The SSH user.
string
""
The SSH user.
The SSH password.
string
""
The SSH password.
The SSH server fingerprint.
string
""
The SSH server fingerprint.
Whether to tunnel the Apache Hive connection over SSH. Use SSH.
bool
false
By default the Cloud will attempt to connect directly to Apache Hive. When this option is enabled, the Cloud will instead establish an SSH connection with the SSHServer and tunnel the connection to Apache Hive through it.
This section provides a complete list of the Logging properties you can configure in the connection string for this provider.
Property | Description |
Verbosity | The verbosity level that determines the amount of detail included in the log file. |
The verbosity level that determines the amount of detail included in the log file.
string
"1"
The verbosity level determines the amount of detail that the Cloud reports to the Logfile. Verbosity levels from 1 to 5 are supported. These are detailed in the Logging page.
This section provides a complete list of the Schema properties you can configure in the connection string for this provider.
Property | Description |
BrowsableSchemas | This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC. |
This property restricts the schemas reported to a subset of the available schemas. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC.
string
""
Listing the schemas from databases can be expensive. Providing a list of schemas in the connection string improves the performance.
This section provides a complete list of the Miscellaneous properties you can configure in the connection string for this provider.
Property | Description |
AsyncQueryTimeout | The timeout for asynchronous requests issued by the provider to download large result sets. |
EnableXSRF | This option specifies whether to add the X-XSRF-Header filter to HiveServer2 HTTP mode. |
FailoverHosts | This property allows you to specify a list of failover hosts in addition to the one configured in Server and Port . |
HTTPPath | The path component of the URL endpoint when using HTTP TransportMode. |
MaxRows | Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses. |
Pagesize | The maximum number of results to return per page from Apache Hive. |
PseudoColumns | This property indicates whether or not to include pseudo columns as columns to the table. |
ServerConfigurations | A list of server configuration variables to override the server defaults. |
Timeout | The value in seconds until the timeout error is thrown, canceling the operation. |
UseDescTableQuery | This option specifies whether the columns will be retrieved using a DESC TABLE query or the GetColumns Thrift API. |
UseShowDatabasesQuery | This option specifies whether the schemas will be retrieved using a SHOW DATABASES query or the GetSchemas Thrift API. |
UseShowTablesQuery | This option specifies whether the tables will be retrieved using a SHOW TABLES query or the GetTables Thrift API. |
UseSSL | Specifies whether to use SSL Encryption when connecting to Hive. |
UseZookeeperDiscovery | Specifies whether to use ZooKeeper Service Discovery. |
ZookeeperNamespace | The namespace configured on ZooKeeper for the Hive Server 2 znodes. |
The timeout for asynchronous requests issued by the provider to download large result sets.
int
300
If the AsyncQueryTimeout property is set to 0, asynchronous operations will not time out; instead, they will run until they complete successfully or encounter an error condition. This property is distinct from Timeout which applies to individual operations while AsyncQueryTimeout applies to execution time of the operation as a whole.
If AsyncQueryTimeout expires and the asynchronous request has not finished being processed, the Cloud raises an error condition.
This option specifies whether to add the X-XSRF-Header filter to HiveServer2 HTTP mode.
bool
false
This option specifies whether to add the X-XSRF-Header filter to HiveServer2 HTTP mode.
This property allows you to specify a list of failover hosts in addition to the one configured in Server and Port .
string
""
Specify both a server and port with multiple values separated with commas. For example 'server1:port1,server2,port2'.
The Server and Port values take precedence and will be used to make the initial connection. Upon a failure to connect to the main host, the list of FailoverHosts will be used to attempt the connection.
The path component of the URL endpoint when using HTTP TransportMode.
string
"cliservice"
This property is used to specify the path component of the URL endpoint when using HTTP TransportMode.
This property should be set to the value specified in the 'hive.server2.thrift.http.path' property of you Hive configuration file (hive-site.xml).
Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.
int
-1
Limits the number of rows returned when no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.
The maximum number of results to return per page from Apache Hive.
int
10000
The Pagesize property affects the maximum number of results to return per page from Apache Hive. Setting a higher value may result in better performance at the cost of additional memory allocated per page consumed.
This property indicates whether or not to include pseudo columns as columns to the table.
string
""
This setting is particularly helpful in Entity Framework, which does not allow you to set a value for a pseudo column unless it is a table column. The value of this connection setting is of the format "Table1=Column1, Table1=Column2, Table2=Column3". You can use the "*" character to include all tables and all columns; for example, "*=*".
A list of server configuration variables to override the server defaults.
string
""
This property takes a comma-separated list of Hive configuration variables. Each value in the comma-separated list will be sent as specified to the Hive server via the 'set' command to override the server default values.
Example: hive.enforce.bucketing=true,hive.enforce.sorting=true
The value in seconds until the timeout error is thrown, canceling the operation.
int
60
If Timeout = 0, operations do not time out. The operations run until they complete successfully or until they encounter an error condition.
If Timeout expires and the operation is not yet complete, the Cloud throws an exception.
This option specifies whether the columns will be retrieved using a DESC TABLE query or the GetColumns Thrift API.
bool
false
When set to true, a DESC TABLE query will be issued to retrieve the columns for the table.
This option specifies whether the schemas will be retrieved using a SHOW DATABASES query or the GetSchemas Thrift API.
bool
false
When set to true, a SHOW DATABASES query will be issued to retrieve the schemas.
This option specifies whether the tables will be retrieved using a SHOW TABLES query or the GetTables Thrift API.
bool
false
When set to true, a SHOW TABLES query will be issued to retrieve the tables for the database.
Specifies whether to use SSL Encryption when connecting to Hive.
bool
false
Set this property to the value specified in the 'hive.server2.use.SSL' property of your Hive configuration file (hive-site.xml).
Specifies whether to use ZooKeeper Service Discovery.
bool
false
When set to 'True', Hive servers will be discovered via the ZooKeeper service.
The Server and Port must be set to the configured ZooKeeper server and port. The ZookeeperNamespace property must be set to the namespace configured on ZooKeeper for the Hive Server 2 znodes.
If multiple ZooKeeper servers are available, FailoverHosts can be set to the additional ZooKeeper instances.
The namespace configured on ZooKeeper for the Hive Server 2 znodes.
string
""
This property is only applicable when UseZookeeperDiscovery is set to 'True'.
Apache Thrift Client
The Apache License Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Apache Zookeeper Client
The Apache License Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.