FireDAC Components for Apache Hive

Build 23.0.8839

Establishing a Connection

You can create and manage connections with the Data Explorer and the FireDAC Connection Editor wizards. See Connection Definitions for more information on creating connections from code and creating persistent, private, or temporary connections.

Connecting to Apache Hive

Apache Hive supports connections from instances that are self-hosted, hosted on Amazon EMR, hosted on Azure HDInsight, or hosted on Google DataProc.

Self-hosted Instance

To connect to Apache Hive from a self-hosted instance, set these connection parameters:

  • TransportMode: The transport mode to use to communicate with the Hive server. Legal entries are BINARY (default) and HTTP.
  • Server: The host name or IP address of the server hosting HiveServer2.
  • Port: The port for the connection to the HiveServer2 instance.
  • UseSSL (optional): Set to enable TLS/SSL.

Amazon EMR

To connect from an Amazon EMR-hosted Apache Hive instance, you must first create an SSH tunnel to EMR. When that is done, you are ready to connect.

Do the following:

  1. Create the SSH tunnel to EMR:
    1. Ensure that you have access to an active EMR cluster and an EC2 key pair. The key pair can be in .ppk or .pem format.
    2. Authorize inbound traffic in your cluster settings.
  2. After an active tunnel session to EMR is established, set these connection parameters:
    • Server: The master node (master-public-dns-name) where the Apache Hive server is running.
    • Port: The port required to connect to Apache Hive.
    • UseSSH: True.
    • SSHServer: The master node (master-public-dns-name).
    • SSHPort: 22.
    • SSHAuthMode: PUBLIC_KEY.
    • SSHUser: hadoop.
    • SSHClientCert: The full path to the key file.
    • SSHClientCertType: The type that corresponds to the key file. Typically either PEMKEY_FILE or PPKFILE.

Hadoop Cluster on Azure HDInsight

To establish a connection to a Hadoop cluster hosted on Azure HDInsight, set these connection parameters:

  • User: The cluster username you specified when creating the cluster on Azure HDInsight.
  • Password: The cluster password you specified when creating the cluster on Azure HDInsight.
  • Server: The server corresponding to your cluster. For example: myclustername.azurehdinsight.net.
  • Port: The port running HiveServer2 (443 is the default).
  • HTTPPath: The HTTP path for the hive2 service (hive2 by default).
  • TransportMode: HTTP.
  • UseSSL: True.
  • QueryPassthrough (optional): True to bypass the SQL engine of the component and execute HiveQL queries directly to Apache Hive.

Google DataProc

To connect from an Google DataProc instance, you must ensure that the Apache Hive server on DataProc is properly configured, then build an SSH tunnel to the hive cluster web interface. After those things are done, you are ready to connect.

Do the following:

  1. Ensure that the Apache Hive server on DataProc was created with the DataProc Component Gateway enabled.
  2. To obtain the external IP address of the Hive Cluster, load up the Cloud Shell and list the instances.
    gcloud compute instances list

    Note the external IP of the relevant machine.

  3. Build an SSH tunnel to the Hive cluster web interface:
    1. Navigate to the Hive cluster on DataProc and select the WEB INTERFACES tab.
    2. Select Create an SSH tunnel to connect to a web interface. A cloud console command displays.
      Use this console to create an SSH key pair. Download the private key from the directory specified in the console.
    3. Configure the SSH tunnel in an SSH utility:
      • Host Name: The external IP noted above.
      • Port: 22
      • Point the tool to your private SSH key.
      • For the Tunnel, map an open port to localhost:10000. The server properly resolves localhost.
  4. Now you are ready to connect to Apache Hive on Google DataProc. Set these connection parameters:
    • TransportMode: BINARY.
    • AuthScheme: Plain.
    • Port: The chosen SSH Tunnel port on the local machine.

Authenticating to Apache Hive

Apache Hive supports PLAIN SASL, LDAP, NOSASL, and Kerberos authentication.

PLAIN SASL

To authenticate to Apache Hive with PLAIN SASL, set the hive.server2.authentication property in your Hive configuration file (hive-site.xml) to None and set the following component connection properties:

  • AuthScheme: PLAIN
  • User The user logging in. If no name is supplied, the login user is set to "anonymous".
  • Password The user's password. If no password is supplied, the password is set to "anonymous".

LDAP

To authenticate to Apache Hive with LDAP, set the hive.server2.authentication property in your Hive configuration file (hive-site.xml) to LDAP and set the following connection properties in the component.
  • AuthScheme: LDAP
  • User The user logging in.
  • Password The user's password.

NOSASL

Authenticate to Apache Hive using NOSASL by setting both the AuthScheme component connection property and the hive.server2.authentication property in your Hive configuration file (hive-site.xml) to NOSASL

Kerberos

To authenticate to Apache Hive with Kerberos, set AuthScheme to KERBEROS.

Authenticating to Apache Hive via Kerberos requires you to define authentication properties and to choose how Kerberos should retrieve authentication tickets.

Retrieve Kerberos Tickets

Kerberos tickets are used to authenticate the requester's identity. The use of tickets instead of formal logins/passwords eliminates the need to store passwords locally or send them over a network. Users are reauthenticated (tickets are refreshed) whenever they log in at their local computer or enter kinit USER at the command prompt.

The component provides three ways to retrieve the required Kerberos ticket, depending on whether or not the KRB5CCNAME and/or KerberosKeytabFile variables exist in your environment.

MIT Kerberos Credential Cache File

This option enables you to use the MIT Kerberos Ticket Manager or kinit command to get tickets. With this option there is no need to set the User or Password connection properties.

This option requires that KRB5CCNAME has been created in your system.

To enable ticket retrieval via MIT Cerberos Credential Cache Files:

  1. Ensure that the KRB5CCNAME variable is present in your environment.
  2. Set KRB5CCNAME to a path that points to your credential cache file. (For example, C:\krb_cache\krb5cc_0 or /tmp/krb5cc_0.) The credential cache file is created when you use the MIT Kerberos Ticket Manager to generate your ticket.
  3. To obtain a ticket:
    1. Open the MIT Kerberos Ticket Manager application.
    2. Click Get Ticket.
    3. Enter your principal name and password.
    4. Click OK.

    If the ticket is successfully obtained, the ticket information appears in Kerberos Ticket Manager and is stored in the credential cache file.

The component uses the cache file to obtain the Kerberos ticket to connect to Apache Hive.

Note: If you would prefer not to edit KRB5CCNAME, you can use the KerberosTicketCache property to set the file path manually. After this is set, the component uses the specified cache file to obtain the Kerberos ticket to connect to Apache Hive.

Keytab File

If your environment lacks the KRB5CCNAME environment variable, you can retrieve a Kerberos ticket using a Keytab File.

To use this method, set the User property to the desired username, and set the KerberosKeytabFile property to a file path pointing to the keytab file associated with the user.

User and Password

If your environment lacks the KRB5CCNAME environment variable and the KerberosKeytabFile property has not been set, you can retrieve a ticket using a user and password combination.

To use this method, set the User and Password properties to the user/password combination that you use to authenticate with Apache Hive.

Enabling Cross-Realm Authentication

More complex Kerberos environments can require cross-realm authentication where multiple realms and KDC servers are used. For example, they might use one realm/KDC for user authentication, and another realm/KDC for obtaining the service ticket.

To enable this kind of cross-realm authentication, set the KerberosRealm and KerberosKDC properties to the values required for user authentication. Also, set the KerberosServiceRealm and KerberosServiceKDC properties to the values required to obtain the service ticket.

Using the FireDAC Connection Editor

Complete the following steps to use the FireDAC Connection Editor to assign a Apache Hive connection to a TFDConnection object:

  1. Open a project and drop a TFDPhysCDataApacheHiveDriverLink from the Tool Palette onto the Form Designer.
  2. Drop a TFDConnection object onto the Form Designer.
  3. Double-click the TFDConnection and select CDataApacheHive in the Driver Id menu.
  4. Define the necessary connection properties.
  5. To execute ad-hoc SQL statements based on this connection, enter SQL commands on the SQL Script tab.
  6. In the Form Designer, select the TFDConnection object and set the Connected property to true.

Using the Data Explorer

Complete the following steps to use the Data Explorer to define persistent connections that can be shared across applications and projects:

  1. Click View > Tool Windows > Data Explorer in RAD Studio and expand the FireDAC node.
  2. Right-click the CData Apache Hive Data Source node and click Add New Connection.
  3. Enter a name for the connection. The FireDAC Connection Editor opens.

Working with Persistent Connections

The connections you define in the Data Explorer can be assigned to a TFDConnection object by specifying the connection name in the TFDConnection object's ConnectionDefName property.

Browsing Data and Metadata

The Data Explorer also provides options for browsing Apache Hive objects at design time:

  • Expand the CData Apache Hive Data Source node in the Data Explorer to view the defined connections.
  • Expand a connection node to find the database objects available for a connection.
  • Drill down to a database object to find metadata such as primary keys.
  • To display data, double-click a table or view or right-click and click View.

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 23.0.8839