Cmdlets for Apache HBase

Build 24.0.9060

Establishing a Connection

With the CData Cmdlets users can install a data module, set the connection properties, and start scripting. This section provides examples of using our ApacheHBase Cmdlets with native PowerShell cmdlets, like the CSV import and export cmdlets.

Installing and Connecting

If you have PSGet, installing the cmdlets can be accomplished from the PowerShell Gallery with the following command. You can also obtain a setup from the CData site.

Install-Module ApacheHBaseCmdlets

The following line is then added to your profile, loading the cmdlets on the next session:

Import-Module ApacheHBaseCmdlets;

You can then use the Connect-ApacheHBase cmdlet to create a connection object that can be passed to other cmdlets:

$conn = Connect-ApacheHBase -Server '127.0.0.1' -Port 8080

Connecting to Apache HBase

The CData Cmdlets PowerShell Module for Apache HBase connects to Apache HBase via the HBase REST (Stargate) server.

To connect to Apache HBase, set these parameters:

  • Server: The host name, IP address, or URL of the server hosting Apache HBase. If there are multiple nodes, use the host name, IP address, or URL of the machine running the REST (Stargate) server.
  • Port: The port for the Apache HBase REST (Stargate) server.

Authenticating to Apache HBase

The CData Cmdlets PowerShell Module for Apache HBase supports the following authentication schemes:

  • Anonymous
  • Basic
  • Negotiate (Kerberos)

Anonymous

By default, no authentication (alternatively known as "anonymous" authentication) is used. Set AuthScheme to None to explicitly enforce no authentication.

Basic

To use Basic authentication, set the following:

  • AuthScheme: Set this to Basic.
  • User: Set this to the Apache HBase user.
  • Password: Set this to the Apache HBase password.

Kerberos

To authenticate to Apache HBase with Kerberos, set AuthScheme to NEGOTIATE.

Authenticating to Apache HBase via Kerberos requires you to define authentication properties and to choose how Kerberos should retrieve authentication tickets.

Retrieve Kerberos Tickets

Kerberos tickets are used to authenticate the requester's identity. The use of tickets instead of formal logins/passwords eliminates the need to store passwords locally or send them over a network. Users are reauthenticated (tickets are refreshed) whenever they log in at their local computer or enter kinit USER at the command prompt.

The cmdlet provides three ways to retrieve the required Kerberos ticket, depending on whether or not the KRB5CCNAME and/or KerberosKeytabFile variables exist in your environment.

MIT Kerberos Credential Cache File

This option enables you to use the MIT Kerberos Ticket Manager or kinit command to get tickets. With this option there is no need to set the User or Password connection properties.

This option requires that KRB5CCNAME has been created in your system.

To enable ticket retrieval via MIT Cerberos Credential Cache Files:

  1. Ensure that the KRB5CCNAME variable is present in your environment.
  2. Set KRB5CCNAME to a path that points to your credential cache file. (For example, C:\krb_cache\krb5cc_0 or /tmp/krb5cc_0.) The credential cache file is created when you use the MIT Kerberos Ticket Manager to generate your ticket.
  3. To obtain a ticket:
    1. Open the MIT Kerberos Ticket Manager application.
    2. Click Get Ticket.
    3. Enter your principal name and password.
    4. Click OK.

    If the ticket is successfully obtained, the ticket information appears in Kerberos Ticket Manager and is stored in the credential cache file.

The cmdlet uses the cache file to obtain the Kerberos ticket to connect to Apache HBase.

Note: If you would prefer not to edit KRB5CCNAME, you can use the KerberosTicketCache property to set the file path manually. After this is set, the cmdlet uses the specified cache file to obtain the Kerberos ticket to connect to Apache HBase.

Keytab File

If your environment lacks the KRB5CCNAME environment variable, you can retrieve a Kerberos ticket using a Keytab File.

To use this method, set the User property to the desired username, and set the KerberosKeytabFile property to a file path pointing to the keytab file associated with the user.

User and Password

If your environment lacks the KRB5CCNAME environment variable and the KerberosKeytabFile property has not been set, you can retrieve a ticket using a user and password combination.

To use this method, set the User and Password properties to the user/password combination that you use to authenticate with Apache HBase.

Enabling Cross-Realm Authentication

More complex Kerberos environments can require cross-realm authentication where multiple realms and KDC servers are used. For example, they might use one realm/KDC for user authentication, and another realm/KDC for obtaining the service ticket.

To enable this kind of cross-realm authentication, set the KerberosRealm and KerberosKDC properties to the values required for user authentication. Also, set the KerberosServiceRealm and KerberosServiceKDC properties to the values required to obtain the service ticket.

Retrieving Data

The Select-ApacheHBase cmdlet provides a native PowerShell interface for retrieving data:

$results = Select-ApacheHBase -Connection $conn -Table "Account" -Columns @("Id, Name") -Where "Industry='Floppy Disks'"
The Invoke-ApacheHBase cmdlet provides an SQL interface. This cmdlet can be used to execute an SQL query via the Query parameter.

Piping Cmdlet Output

The cmdlets return row objects to the pipeline one row at a time. The following line exports results to a CSV file:

Select-ApacheHBase -Connection $conn -Table Account -Where "Industry = 'Floppy Disks'" | Select -Property * -ExcludeProperty Connection,Table,Columns | Export-Csv -Path c:\myAccountData.csv -NoTypeInformation

You will notice that we piped the results from Select-ApacheHBase into a Select-Object cmdlet and excluded some properties before piping them into an Export-CSV cmdlet. We do this because the CData Cmdlets append Connection, Table, and Columns information onto each row object in the result set, and we do not necessarily want that information in our CSV file.

However, this makes it easy to pipe the output of one cmdlet to another. The following is an example of converting a result set to JSON:

 
PS C:\> $conn  = Connect-ApacheHBase -Server '127.0.0.1' -Port 8080
PS C:\> $row = Select-ApacheHBase -Connection $conn -Table "Account" -Columns (Id, Name) -Where "Industry = 'Floppy Disks'" | select -first 1
PS C:\> $row | ConvertTo-Json
{
  "Connection":  {

  },
  "Table":  "Account",
  "Columns":  [

  ],
  "Id":  "MyId",
  "Name":  "MyName"
} 

Deleting Data

The following line deletes any records that match the criteria:

Select-ApacheHBase -Connection $conn -Table Account -Where "Industry = 'Floppy Disks'" | Remove-ApacheHBase

Modifying Data

The cmdlets make data transformation easy as well as data cleansing. The following example loads data from a CSV file into Apache HBase, checking first whether a record already exists and needs to be updated instead of inserted.

Import-Csv -Path C:\MyAccountUpdates.csv | %{
  $record = Select-ApacheHBase -Connection $conn -Table Account -Where ("Id = `'"+$_.Id+"`'")
  if($record){
    Update-ApacheHBase -Connection $conn -Table Account -Columns @("Id","Name") -Values @($_.Id, $_.Name) -Where "Id  = `'$_.Id`'"
  }else{
    Add-ApacheHBase -Connection $conn -Table Account -Columns @("Id","Name") -Values @($_.Id, $_.Name)
  }
}

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 24.0.9060