Apache Hive is a data warehouse infrastructure which provides query, data summarization, and analysis, built on top of Hadoop. The Apache Hive data warehouse software facilitates writing, reading, and managing large datasets with distributed storage using SQL. A JDBC driver and a command line tool are provided to connect users to Hive.

Type Name

hive

Hive is a data warehousing infrastructure based on Hadoop. Hadoop provides massive scale-out and fault tolerance capabilities for data storage and processing (using the map-reduce programming paradigm) on commodity hardware. Hive has limited support for data types as it supports integer variants, boolean, float, double, and string. It does not have native support for time-based types, XML, or LOBs. These limitations are reflected in the connector capabilities. The view table can use these types, however, the transformation would need to specify the necessary transformations. Note that in those situations, the evaluations will be done in the Data Virtuality Server engine. Another limitation of Hive is that it only supports EQUI join, so using any other join types on its source tables will result in inefficient queries. To write criteria based on partitioned columns, they can be modelled on the source table, but not included in selected columns.

Connection Properties

Template name: hive2

Appropriate translator name: hive

Properties:

  • host (default: localhost)
  • port (default: 10000)
  • db (default: default)
  • user-name
  • password (default: empty)
  • driver (hive for hive-jdbc-1.2.1, oldhive for hive-jdbc-1.1.0; default: hive)
  • driver-class (default: org.apache.hive.jdbc.HiveDriver)

  • jdbc-properties (arbitrary extra properties)
  • new-connection-sql
  • check-valid-connection-sql (default: show databases)
  • min-pool-size (default: 2)
  • max-pool-size (default: 70)
  • cloudAgent (default: FALSE)

Translator Properties

Translator Properties Shared by All JDBC Connectors

(Properties listed in alphabetical order)

To view the full table, click the expand button in its top right corner


Name

Description

Default value

comparisonStringConversion

Sets a template to convert WHERE clauses on strings

Examples
comparisonStringConversion=%s
-- no conversion will be applied
 
comparisonStringConversion=binary %s
-- WHERE binary someStringExpression LIKE someOtherStringExpression
 
comparisonStringConversion=(%s COLLATE Latin1_General_CS_AS)
-- WHERE (someStringExpression COLLATE Latin1_General_CS_AS LIKE someOtherStringExpression)


  • MS SQL Server:
    (%s COLLATE Latin1_General_CS_AS)
  • MySQL, SingleStore:
    binary %s
  • Others:
    %s

DatabaseTimeZone

Database time zone, used when fetching date, time, or timestamp values

System default time zone

DatabaseVersion

Specific database version, used to fine-tune pushdown support

Automatically detected by the server through the data source JDBC driver, if possible

forbidNonMatchingNames

Only considered when importer.tableNamePattern is set. When set to FALSE, allows creation of tables with names that do not match the tableNamePattern. Tables with non-matching names will only be accessible until the server restarts

TRUE

ForceQuotedIdentifiers

If TRUE, all identifiers in the final query (that is being generated by a translator to be sent directly to the target DBMS) will be quoted

TRUE

MaxPreparedInsertBatchSize

Maximum size of prepared insert batch

2048

OrderByStringConversion

Sets a template to convert ORDER BY clauses on strings

Examples
OrderByStringConversion=%s
-- no conversion will be applied
 
OrderByStringConversion=(binary %s)
-- ORDER BY (binary someStringExpression)
 
OrderByStringConversion=cast((%s) as varchar(2000) ccsid 1208)
-- ORDER BY cast((someStringExpression) as varchar(2000) ccsid 1208)
  • DB2 for AS/400:
    cast((%s) as varchar(2000) ccsid 1208)
  • DB2 for zOS:
    cast((%s) as varchar(2000) ccsid ASCII)
  • MS SQL Server:
    (cast(cast(%s as varchar) as varbinary)
  • MySQL, SingleStore:
    (binary %s)
  • PostgreSQL:
    (%s COLLATE \"C\")
  • Others:
    %s

supportsConvertFromClobToString

If TRUE, indicates that the translator supports the CONVERT/CAST function from clob to string

  • MySQL, PostgreSQL, Redshift, Greenplum, Oracle, MS SQL Server:
    TRUE 
  • Others:
    FALSE

supportsNativeQueries

Forces a translator to issue a native() system procedure that can be used to pass native queries directly to an underlying DBMS

Example
SELECT x.* 
FROM table
CALL "dwh.native"("request" => 'SELECT query, pid, elapsed, substring FROM svl_qlog ORDER BY STARTTIME DESC LIMIT 200'
) w
, ARRAYTABLE( w.tuple COLUMNS query string, pid integer , elapsed string, "substring" string ) x;

More usage examples

  • MySQL, PostgreSQL, Redshift, Snowflake, Oracle, MS SQL Server; Exasol, Vertica, Bigquery, SingleStore, Azure:
    TRUE
  • Others:
    FALSE

supportsOrderByAlias

If FALSE, the CData Virtuality Server does not generate aliases in the ORDER BY clause.

If TRUE, indicates that the source supports aliases in the ORDER BY clause

  • PostgreSQL:
    TRUE
  • Others:
    • if OrderByStringConversion is not set, same as supportsOrderBy
    • if OrderByStringConversion is set, default for supportsOrderByAlias is  FALSE

supportsOrderByString

If FALSE, the CData Virtuality Server does not push down the ORDER BY clause if it contains at least one expression of string type.

If TRUE, indicates that the source supports strings in the ORDER BY clause

TrimStrings

If TRUE, trims trailing whitespace from fixed-length character strings. Please note that the CData Virtuality Server only has string or varchar type that treats trailing whitespace as meaningful

FALSE

UseBindVariables

if TRUE, indicates that PreparedStatements should be used and that literal values in the source query should be replaced with bind variables. If FALSE, only LOB values will trigger the use of PreparedStatements

TRUE

UseCommentsInSourceQuery

Embeds a / comment / leading comment with session/request id in the source SQL query for informational purposes

FALSE

The names of the translator properties are case-sensitive.

Translator Properties Specific for Hive

Name

Description

Default value

webhdfsEndpoint

Example: webhdfsEndpoint="webhdfs://myhdfsnode:50070"

Please note that the hostname should be the node name of the Hadoop node which runs the WebHDFS service. It is not possible to use the IP address or some alternative DNS name. If the node name does not resolve on the Data Virtuality Server, please add the node name with the correct IP address to the host file on the Data Virtuality Server


webhdfsLocation

Example: webhdfsLocation="/user/hive"


tableType

Possible values: TEXTFILE, PARQUET

Example:
tableType=PARQUET

PARQUET

PARQUET

Data Source Properties

Data Source Properties Shared by All JDBC Connectors

(Properties listed in alphabetical order)

To view the full table, click the expand button in its top right corner


Name

Description

Default

importer.autoCorrectColumnNames

Replaces . in a column name with _ as the period character is not supported by the CData Virtuality Server in column names

TRUE

importer.catalog

Database catalogs to use. Can be used if the db connection property is omitted.

Only for Microsoft SQL Server and Snowflake:

  • Multiple catalogs (multiple databases) usage is supported;
  • '%' as a wildcard is allowed: for example, importer.catalog="foo%" will import foo, foobar, etc.
  • To specify several catalog names or/and patterns, values should be comma-separated and enclosed within double quotes: importer.catalog="schema1,schema2,pattern1%,pattern2%";
  • To use all catalogs, the value should be '%': importer.catalog="%"
  • To escape a wildcard in a catalog name, use the '\' escape character. The value should be as follows: importer.catalog="test\_tables"

Exasol:

EXA_DB

All others: empty

importer.defaultSchema

  • Only for data sources that are also supported as Analytical Storage (SQLServer, MySQL, Oracle, PostgreSQL, Redshift);
  • When the property is correctly set, SELECT INTO, CREATE, and DROP TABLE commands are enabled for that data source;
  • Must point to the original schema name in the DBMS (e.g. importer.defaultSchema=public);
  • Needs specifying where tables will be created or dropped in the source DBMS;
  • Not meant to be used with Analytical Storage data source

Please note that writing into a data source is only possible if this parameter is set.

Empty

importer.enableMetadataCache

 Turns on metadata cache for a single data source even when the global option is turned off. Together with importer.skipMetadataLoadOnStartup=true, it allows using materialized views after server restart when the original source is unavailable

FALSE

importer.excludeProcedures

Case-insensitive regular expression that will exclude a matching fully qualified procedure name from import

Empty

importer.excludeSchemas

Comma-separated list of schemas (no % or ? wildcards allowed) to exclude listed schemas from import. A schema specified in defaultSchema or schemaPattern will be imported despite being listed in excludeSchemas. Helps to speed up metadata loading

Oracle:

APEX_PUBLIC_USER,
DIP,
FLOWS_040100,
FLOWS_020100,
FLOWS_FILES,
MDDATA,
ORACLE_OCM,
SPATIAL_CSW_ADMIN_USR,
SPATIAL_WFS_ADMIN_USR,
XS$NULL,
BI,
HR,
OE,
PM,
IX,
SH,
SYS,
SYSTEM,
MDSYS,
CTXSYS

All others: empty

importer.excludeTables

Case-insensitive regular expression that will exclude a matching fully qualified table name from import. Does not speed up metadata loading. Here are some examples:

1. Excluding all tables in the (source) schemas sys and INFORMATION_SCHEMA:

importer.excludeTables=(.*[.]sys[.].*|.*[.]INFORMATION_SCHEMA[.].*)

2. Excluding all tables except the ones starting with "public.br" and "public.mk" using a negative lookahead: 

importer.excludeTables=(?!public\.(br|mk)).*

3. Excluding "tablename11" from the list ["tablename1", "tablename11", "company", "companies"]:

importer.excludeTables=.*\.(?!\btablename1\b|\bcompan).*
  • ".*\." will match the schema portion of the table name;
  • "\b" is a word boundary. It is used to define the boundary between text characters and non-character text;
  • "\btablename1\b" will only match tablename1 and not tablename11 because of the use of "\b";
  • "\bcompan" will match "company" and "companies" because we did not use "\b" at the end to delimit a word boundary;
  • ".*" at the end is necessary to match the trailing characters after the name

Empty

importer.fetchSize

Fetch size assigned to a resultset on loading metadata

No default value

importer.importApproximateIndexes

If set to TRUE, imports approximate index information

TRUE

importer.importIndexes

If set to TRUE, imports index/unique key/cardinality information

FALSE

importer.importKeys

If set to TRUE, imports primary and foreign keys

FALSE

importer.importProcedures

If set to TRUE, imports procedures and procedure columns. Overloaded procedures can be imported with an additional option useProcedureSpecificName.

Please note that it is currently not possible to import procedures which use the same name for more than one parameter (e.g. same name for IN and OUT parameters). Such procedures can be excluded from import with the parameter excludeProcedures.
Please note that it is not always possible to import procedure result set columns due to database limitations

FALSE

TRUE only for CData connector

importer.loadColumnsTableByTable

Set to TRUE to force table by table metadata processing

FALSE/TRUE only for Netsuite and SAP Advantage Database Server

importer.loadMetadataWithJdbc

If set to TRUE, turns off all custom metadata load ways

FALSE

importer.loadSourceSystemFunctions

If set to TRUE, data source-specific functions are loaded. Supported for Microsoft SQL Server and Azure

FALSE

importer.procedureNamePattern

Procedure(s) to import. If omitted, all procedures will be imported. % as a wildcard is allowed: for example, importer.procedureNamePattern=foo% will import foo , foobar, etc. W orks only in combination with importProcedures

Empty

importer.quoteNameInSource

If set to FALSE, directs the Data Virtuality Server to create source queries using unquoted identifiers

TRUE

importer.renameDuplicateColumns

If set to TRUE,  renames duplicate columns caused by either mixed case collisions or autoCorrectColumnNames replacing . with _. The suffix _n where n is an integer will be added to make the name unique

TRUE

importer.renameDuplicateTables

If set to TRUE, renames duplicate tables caused by mixed case collisions. The suffix _n where n is an integer will be added to make the name unique

TRUE

importer.replaceSpecSymbsInColNames

If set to TRUE, replaces all special symbols (any symbols not in the ^A-Za-z0-9_ sequence) to the _ symbol in column names of tables

FALSE / TRUE only for BigQuery

importer.schemaPattern

Schema(s) to import. If omitted or has "" value, all schemas will be imported. % as wildcard is allowed: for example, importer.schemaPattern=foo% will import foo, foobar, etc. To specify several schema names or/and patterns, values should be comma-separated and enclosed within double quotes: importer.schemaPattern="schema1,schema2,pattern1%,pattern2%". For proper escaping of special characters depending on the type of data source, check Escaping special characters in schema names or use wildcards instead: "[schema_name]" can be rewritten as "%schema%name%". Helps to speed up metadata loading

Empty

importer.skipMetadataLoadOnStartup

If set to TRUE, allows skipping metadata loading on server startup if there's cached metadata for the data source. . Together with importer.enableMetadataCache=true, it allows using materialized views after server restart when the original source is unavailable

FALSE

importer.tableNamePattern

Table(s) to import. If omitted, all tables will be imported. % as a wildcard is allowed: for example, importer.tableNamePattern=foo% will import foo, foobar, etc

Empty

importer.tableTypes

Comma-separated list (without spaces) of table types to import. Available types depend on the DBMS. Usual format: "TABLE,VIEW".

Other typical types are "SYSTEM TABLE", "GLOBAL TEMPORARY", "LOCAL TEMPORARY", "ALIAS", "SYNONYM"

Empty

importer.useCatalogName

If set to TRUE, uses any non-null/non-empty catalogue name as part of the name in source, e.g. "catalogue"."table"."column", and in the CData Virtuality Server runtime name if useFullSchemaName is TRUE. If set to FALSE, will not use the catalogue name in either the name in source or the CData Virtuality Server runtime name. Should be set to FALSE for sources that do not fully support a catalogue concept, but return a non-null catalogue name in their metadata - such as HSQL

TRUE / FALSE only for Hive and EXASOL

importer.useFullSchemaName

If set to FALSE, directs the importer to drop the source catalogue/schema from the CData Virtuality Server object name so that the CData Virtuality Server fully qualified name will be in the form of <model name>.<table name>.

Please note that this may lead to objects with duplicate names when importing from multiple schemas, which results in an exception

TRUE

importer.useProcedureSpecificName

If set to TRUE, allows the import of overloaded procedures (which will normally result in a duplicate procedure error) by using the unique procedure specific name in the CData Virtuality Server. This option will only work with JDBC 4.0 compatible drivers that report specific names

FALSE

importer.widenUnsignedTypes

If set to TRUE, converts unsigned types to the next widest type. For example, SQL Server reports tinyint as an unsigned type. With this option enabled, tinyint would be imported as a short instead of a byte

TRUE

The names of the data source properties are case-sensitive.

Escaping wildcards in importer.catalog available since v4.0.8

Default values importer.catalog='EXA_DB' and importer.useCatalogName=FALSE available since v4.4

importer.loadSourceSystemFunctions is available since v4.6

importer.importProcedures set to TRUE by default for CData connector since v4.7

Data Source Properties Specific to Hive

Name

Description

Default value

importer.useDatabaseMetadata

For Hive 0.13.0 and later, the normal JDBC DatabaseMetaData facilities are sufficient to perform an import. Set to TRUE to use the normal JDBC import logic. Set to FALSE to use native Hive methods to import metadata. When TRUE, trimColumnNames has no effect

FALSE

importer.trimColumnNames

For Hive 0.11.0 and later, metadata for the DESCRIBE command could be inappropriately returned with padding. Set to TRUE to trim white spaces from column names

FALSE