Excel Add-In for Apache Hive

Build 24.0.9060

Establishing a Connection

Configure a Connection Profile

From the CData ribbon, click Get Data and select From Apache Hive connection/s to launch the CData Query window. To setup a new connection, you will have to click the New Apache Hive Connection button. Here you can set the connection settings, test the connection, and save the connection profile.

Connecting to Apache Hive

Apache Hive supports connections from instances that are self-hosted, hosted on Amazon EMR, hosted on Azure HDInsight, or hosted on Google DataProc.

Self-hosted Instance

To connect to Apache Hive from a self-hosted instance, set these connection parameters:

  • TransportMode: The transport mode to use to communicate with the Hive server. Legal entries are BINARY (default) and HTTP.
  • Server: The host name or IP address of the server hosting HiveServer2.
  • Port: The port for the connection to the HiveServer2 instance.
  • UseSSL (optional): Set to enable TLS/SSL.

Amazon EMR

To connect from an Amazon EMR-hosted Apache Hive instance, you must first create an SSH tunnel to EMR. When that is done, you are ready to connect.

Do the following:

  1. Create the SSH tunnel to EMR:
    1. Ensure that you have access to an active EMR cluster and an EC2 key pair. The key pair can be in .ppk or .pem format.
    2. Authorize inbound traffic in your cluster settings.
  2. After an active tunnel session to EMR is established, set these connection parameters:
    • Server: The master node (master-public-dns-name) where the Apache Hive server is running.
    • Port: The port required to connect to Apache Hive.
    • UseSSH: True.
    • SSHServer: The master node (master-public-dns-name).
    • SSHPort: 22.
    • SSHAuthMode: PUBLIC_KEY.
    • SSHUser: hadoop.
    • SSHClientCert: The full path to the key file.
    • SSHClientCertType: The type that corresponds to the key file. Typically either PEMKEY_FILE or PPKFILE.

Hadoop Cluster on Azure HDInsight

To establish a connection to a Hadoop cluster hosted on Azure HDInsight, set these connection parameters:

  • User: The cluster username you specified when creating the cluster on Azure HDInsight.
  • Password: The cluster password you specified when creating the cluster on Azure HDInsight.
  • Server: The server corresponding to your cluster. For example: myclustername.azurehdinsight.net.
  • Port: The port running HiveServer2 (443 is the default).
  • HTTPPath: The HTTP path for the hive2 service (hive2 by default).
  • TransportMode: HTTP.
  • UseSSL: True.
  • QueryPassthrough (optional): True to bypass the SQL engine of the add-in and execute HiveQL queries directly to Apache Hive.

Google DataProc

To connect from an Google DataProc instance, you must ensure that the Apache Hive server on DataProc is properly configured, then build an SSH tunnel to the hive cluster web interface. After those things are done, you are ready to connect.

Do the following:

  1. Ensure that the Apache Hive server on DataProc was created with the DataProc Component Gateway enabled.
  2. To obtain the external IP address of the Hive Cluster, load up the Cloud Shell and list the instances.
    gcloud compute instances list

    Note the external IP of the relevant machine.

  3. Build an SSH tunnel to the Hive cluster web interface:
    1. Navigate to the Hive cluster on DataProc and select the WEB INTERFACES tab.
    2. Select Create an SSH tunnel to connect to a web interface. A cloud console command displays.
      Use this console to create an SSH key pair. Download the private key from the directory specified in the console.
    3. Configure the SSH tunnel in an SSH utility:
      • Host Name: The external IP noted above.
      • Port: 22
      • Point the tool to your private SSH key.
      • For the Tunnel, map an open port to localhost:10000. The server properly resolves localhost.
  4. Now you are ready to connect to Apache Hive on Google DataProc. Set these connection parameters:
    • TransportMode: BINARY.
    • AuthScheme: Plain.
    • Port: The chosen SSH Tunnel port on the local machine.

Authenticating to Apache Hive

Apache Hive supports PLAIN SASL, LDAP, NOSASL, and Kerberos authentication.

PLAIN SASL

To authenticate to Apache Hive with PLAIN SASL, set the hive.server2.authentication property in your Hive configuration file (hive-site.xml) to None and set the following add-in connection properties:

  • AuthScheme: PLAIN
  • User The user logging in. If no name is supplied, the login user is set to "anonymous".
  • Password The user's password. If no password is supplied, the password is set to "anonymous".

LDAP

To authenticate to Apache Hive with LDAP, set the hive.server2.authentication property in your Hive configuration file (hive-site.xml) to LDAP and set the following connection properties in the add-in.
  • AuthScheme: LDAP
  • User The user logging in.
  • Password The user's password.

NOSASL

Authenticate to Apache Hive using NOSASL by setting both the AuthScheme add-in connection property and the hive.server2.authentication property in your Hive configuration file (hive-site.xml) to NOSASL

Kerberos

To authenticate to Apache Hive with Kerberos, set AuthScheme to KERBEROS.

Authenticating to Apache Hive via Kerberos requires you to define authentication properties and to choose how Kerberos should retrieve authentication tickets.

Retrieve Kerberos Tickets

Kerberos tickets are used to authenticate the requester's identity. The use of tickets instead of formal logins/passwords eliminates the need to store passwords locally or send them over a network. Users are reauthenticated (tickets are refreshed) whenever they log in at their local computer or enter kinit USER at the command prompt.

The add-in provides three ways to retrieve the required Kerberos ticket, depending on whether or not the KRB5CCNAME and/or KerberosKeytabFile variables exist in your environment.

MIT Kerberos Credential Cache File

This option enables you to use the MIT Kerberos Ticket Manager or kinit command to get tickets. With this option there is no need to set the User or Password connection properties.

This option requires that KRB5CCNAME has been created in your system.

To enable ticket retrieval via MIT Cerberos Credential Cache Files:

  1. Ensure that the KRB5CCNAME variable is present in your environment.
  2. Set KRB5CCNAME to a path that points to your credential cache file. (For example, C:\krb_cache\krb5cc_0 or /tmp/krb5cc_0.) The credential cache file is created when you use the MIT Kerberos Ticket Manager to generate your ticket.
  3. To obtain a ticket:
    1. Open the MIT Kerberos Ticket Manager application.
    2. Click Get Ticket.
    3. Enter your principal name and password.
    4. Click OK.

    If the ticket is successfully obtained, the ticket information appears in Kerberos Ticket Manager and is stored in the credential cache file.

The add-in uses the cache file to obtain the Kerberos ticket to connect to Apache Hive.

Note: If you would prefer not to edit KRB5CCNAME, you can use the KerberosTicketCache property to set the file path manually. After this is set, the add-in uses the specified cache file to obtain the Kerberos ticket to connect to Apache Hive.

Keytab File

If your environment lacks the KRB5CCNAME environment variable, you can retrieve a Kerberos ticket using a Keytab File.

To use this method, set the User property to the desired username, and set the KerberosKeytabFile property to a file path pointing to the keytab file associated with the user.

User and Password

If your environment lacks the KRB5CCNAME environment variable and the KerberosKeytabFile property has not been set, you can retrieve a ticket using a user and password combination.

To use this method, set the User and Password properties to the user/password combination that you use to authenticate with Apache Hive.

Enabling Cross-Realm Authentication

More complex Kerberos environments can require cross-realm authentication where multiple realms and KDC servers are used. For example, they might use one realm/KDC for user authentication, and another realm/KDC for obtaining the service ticket.

To enable this kind of cross-realm authentication, set the KerberosRealm and KerberosKDC properties to the values required for user authentication. Also, set the KerberosServiceRealm and KerberosServiceKDC properties to the values required to obtain the service ticket.

Connection Properties

The Connection properties describe the various options that can be used to establish a connection.

Managing Connections

After successfully authenticating to Apache Hive you will be able to customize the data you are importing. To learn more about this, see Managing Connections.

See Also

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 24.0.9060