macOS DSN Configuration
This section shows how to set up ODBC connectivity and configure DSNs on macOS.
Minimum macOS Version
The CData ODBC Driver for Apache Hive driver requires macOS Sierra (10.12) or above.
Licensing the Driver
In a terminal, run the following commands to license the driver. To activate a trial, omit the <key> input.
cd "/Applications/CData ODBC Driver for Apache Hive/bin"
sudo ./install-license.sh <key>
You'll be prompted for a name and password. These refer to your name and your machine's password.
Connecting to Apache Hive
Apache Hive supports connections from instances that are self-hosted, hosted on Amazon EMR, hosted on Azure HDInsight, or hosted on Google DataProc.
Self-hosted Instance
To connect to Apache Hive from a self-hosted instance, set these connection parameters:- TransportMode: The transport mode to use to communicate with the Hive server. Legal entries are BINARY (default) and HTTP.
- Server: The host name or IP address of the server hosting HiveServer2.
- Port: The port for the connection to the HiveServer2 instance.
- UseSSL (optional): Set to enable TLS/SSL.
Amazon EMR
To connect from an Amazon EMR-hosted Apache Hive instance, you must first create an SSH tunnel to EMR. When that is done, you are ready to connect.Do the following:
- Create the SSH tunnel to EMR:
- Ensure that you have access to an active EMR cluster and an EC2 key pair. The key pair can be in .ppk or .pem format.
- Authorize inbound traffic in your cluster settings.
- After an active tunnel session to EMR is established, set these connection parameters:
- Server: The master node (master-public-dns-name) where the Apache Hive server is running.
- Port: The port required to connect to Apache Hive.
- UseSSH: True.
- SSHServer: The master node (master-public-dns-name).
- SSHPort: 22.
- SSHAuthMode: PUBLIC_KEY.
- SSHUser: hadoop.
- SSHClientCert: The full path to the key file.
- SSHClientCertType: The type that corresponds to the key file. Typically either PEMKEY_FILE or PPKFILE.
Hadoop Cluster on Azure HDInsight
To establish a connection to a Hadoop cluster hosted on Azure HDInsight, set these connection parameters:- User: The cluster username you specified when creating the cluster on Azure HDInsight.
- Password: The cluster password you specified when creating the cluster on Azure HDInsight.
- Server: The server corresponding to your cluster. For example: myclustername.azurehdinsight.net.
- Port: The port running HiveServer2 (443 is the default).
- HTTPPath: The HTTP path for the hive2 service (hive2 by default).
- TransportMode: HTTP.
- UseSSL: True.
- QueryPassthrough (optional): True to bypass the SQL engine of the driver and execute HiveQL queries directly to Apache Hive.
Google DataProc
To connect from an Google DataProc instance, you must ensure that the Apache Hive server on DataProc is properly configured, then build an SSH tunnel to the hive cluster web interface. After those things are done, you are ready to connect.Do the following:
- Ensure that the Apache Hive server on DataProc was created with the DataProc Component Gateway enabled.
- To obtain the external IP address of the Hive Cluster, load up the Cloud Shell and list the instances.
gcloud compute instances list
Note the external IP of the relevant machine.
- Build an SSH tunnel to the Hive cluster web interface:
- Navigate to the Hive cluster on DataProc and select the WEB INTERFACES tab.
- Select Create an SSH tunnel to connect to a web interface. A cloud console
command displays.
Use this console to create an SSH key pair. Download the private key from the directory specified in the console. - Configure the SSH tunnel in an SSH utility:
- Host Name: The external IP noted above.
- Port: 22
- Point the tool to your private SSH key.
- For the Tunnel, map an open port to localhost:10000. The server properly resolves localhost.
- Now you are ready to connect to Apache Hive on Google DataProc. Set these connection
parameters:
- TransportMode: BINARY.
- AuthScheme: Plain.
- Port: The chosen SSH Tunnel port on the local machine.
Authenticating to Apache Hive
Apache Hive supports PLAIN SASL, LDAP, NOSASL, and Kerberos authentication.
PLAIN SASL
To authenticate to Apache Hive with PLAIN SASL, set the hive.server2.authentication property in your Hive configuration file (hive-site.xml) to None and set the following driver connection properties:
- AuthScheme: PLAIN
- User The user logging in. If no name is supplied, the login user is set to "anonymous".
- Password The user's password. If no password is supplied, the password is set to "anonymous".
LDAP
To authenticate to Apache Hive with LDAP, set the hive.server2.authentication property in your Hive configuration file (hive-site.xml) to LDAP and set the following connection properties in the driver.- AuthScheme: LDAP
- User The user logging in.
- Password The user's password.
NOSASL
Authenticate to Apache Hive using NOSASL by setting both the AuthScheme driver connection property and the hive.server2.authentication property in your Hive configuration file (hive-site.xml) to NOSASL
Kerberos
To authenticate to Apache Hive with Kerberos, set AuthScheme to KERBEROS.
Authenticating to Apache Hive via Kerberos requires you to define authentication properties and to choose how Kerberos should retrieve authentication tickets.
Retrieve Kerberos Tickets
Kerberos tickets are used to authenticate the requester's identity. The use of tickets instead of formal logins/passwords eliminates the need to store passwords locally or send them over a network. Users are reauthenticated (tickets are refreshed) whenever they log in at their local computer or enter kinit USER at the command prompt.The driver provides three ways to retrieve the required Kerberos ticket, depending on whether or not the KRB5CCNAME and/or KerberosKeytabFile variables exist in your environment.
MIT Kerberos Credential Cache File
This option enables you to use the MIT Kerberos Ticket Manager or kinit command to get tickets. With this option there is no need to set the User or Password connection properties.
This option requires that KRB5CCNAME has been created in your system.
To enable ticket retrieval via MIT Cerberos Credential Cache Files:
- Ensure that the KRB5CCNAME variable is present in your environment.
- Set KRB5CCNAME to a path that points to your credential cache file. (For example, C:\krb_cache\krb5cc_0 or /tmp/krb5cc_0.) The credential cache file is created when you use the MIT Kerberos Ticket Manager to generate your ticket.
- To obtain a ticket:
- Open the MIT Kerberos Ticket Manager application.
- Click Get Ticket.
- Enter your principal name and password.
- Click OK.
If the ticket is successfully obtained, the ticket information appears in Kerberos Ticket Manager and is stored in the credential cache file.
The driver uses the cache file to obtain the Kerberos ticket to connect to Apache Hive.
Note: If you would prefer not to edit KRB5CCNAME, you can use the KerberosTicketCache property to set the file path manually. After this is set, the driver uses the specified cache file to obtain the Kerberos ticket to connect to Apache Hive.
Keytab File
If your environment lacks the KRB5CCNAME environment variable, you can retrieve a Kerberos ticket using a Keytab File.
To use this method, set the User property to the desired username, and set the KerberosKeytabFile property to a file path pointing to the keytab file associated with the user.
User and Password
If your environment lacks the KRB5CCNAME environment variable and the KerberosKeytabFile property has not been set, you can retrieve a ticket using a user and password combination.
To use this method, set the User and Password properties to the user/password combination that you use to authenticate with Apache Hive.
Enabling Cross-Realm Authentication
More complex Kerberos environments can require cross-realm authentication where multiple realms and KDC servers are used. For example, they might use one realm/KDC for user authentication, and another realm/KDC for obtaining the service ticket.To enable this kind of cross-realm authentication, set the KerberosRealm and KerberosKDC properties to the values required for user authentication. Also, set the KerberosServiceRealm and KerberosServiceKDC properties to the values required to obtain the service ticket.
Uninstalling the Driver
The easiest way to uninstall the driver is to open a terminal and run the included uninstall.sh script, located in the installation directory. For example:
cd "/Applications/CData ODBC Driver for Apache Hive" sudo ./uninstall.sh
Note: The script needs to be run from the installation directory.