Cmdlets for Apache Impala

Build 24.0.9060

Establishing a Connection

With the CData Cmdlets users can install a data module, set the connection properties, and start scripting. This section provides examples of using our ApacheImpala Cmdlets with native PowerShell cmdlets, like the CSV import and export cmdlets.

Installing and Connecting

If you have PSGet, installing the cmdlets can be accomplished from the PowerShell Gallery with the following command. You can also obtain a setup from the CData site.

Install-Module ApacheImpalaCmdlets

The following line is then added to your profile, loading the cmdlets on the next session:

Import-Module ApacheImpalaCmdlets;

You can then use the Connect-ApacheImpala cmdlet to create a connection object that can be passed to other cmdlets:

$conn = Connect-ApacheImpala -Server '127.0.0.1' -Port '21050'

Connecting to Apache Impala

In order to connect to Apache Impala, set the following:

  • Server: The name or network address of the SQL Server instance.
  • Port: The port for the connection to the Impala Server instance.
  • ProtocolVersion: The Thrift protocol version to use when connecting to the Impala server.
  • Database (optional): A default database to use when one is not supplied in the SQL query. This enables using table names without having to specify database.tablename in the query.
  • Pagesize (optional): The number of results to pull per page from Apache Impala when selecting data.
  • QueryPassthrough (optional): Indicates if the query should be passed to Impala as-is.
  • UseSSL (optional): Set this to enable TLS/SSL.

    When QueryPassthrough is set to false (default), the CData ADO.NET Provider for Apache Impala will attempt to modify the query to conform to Impala required format.

Authenticating to Apache Impala

There are several ways to authenticate to Apache Impala including:

  • NoSasl
  • LDAP
  • Kerberos

NoSasl

When using NoSasl, no authentication is performed. It is used when you are connecting to a server from a trusted location such as a test machine on your local network. By default, NoSasl is as the default AuthScheme, so no additional connection properties need to be set.

LDAP

To authenticate with LDAP, set the following connection properties:

  • AuthScheme: Set this to LDAP.
  • User: Set this to user to login as.
  • Password: Set this to the password of the user.
To authenticate, set User, Password, and AuthScheme. If the LDAP server enables the Unauthenticated Authentication Mechanism of Simple Bind, the Password is optional instead of required.

Kerberos

Set the AuthScheme property to Kerberos. Please see Using Kerberos for details about how to authenticate with Kerberos.

Retrieving Data

The Select-ApacheImpala cmdlet provides a native PowerShell interface for retrieving data:

$results = Select-ApacheImpala -Connection $conn -Table "[CData].[Default].Customers" -Columns @("City, CompanyName") -Where "Country='US'"
The Invoke-ApacheImpala cmdlet provides an SQL interface. This cmdlet can be used to execute an SQL query via the Query parameter.

Piping Cmdlet Output

The cmdlets return row objects to the pipeline one row at a time. The following line exports results to a CSV file:

Select-ApacheImpala -Connection $conn -Table [CData].[Default].Customers -Where "Country = 'US'" | Select -Property * -ExcludeProperty Connection,Table,Columns | Export-Csv -Path c:\my[CData].[Default].CustomersData.csv -NoTypeInformation

You will notice that we piped the results from Select-ApacheImpala into a Select-Object cmdlet and excluded some properties before piping them into an Export-CSV cmdlet. We do this because the CData Cmdlets append Connection, Table, and Columns information onto each row object in the result set, and we do not necessarily want that information in our CSV file.

However, this makes it easy to pipe the output of one cmdlet to another. The following is an example of converting a result set to JSON:

 
PS C:\> $conn  = Connect-ApacheImpala -Server '127.0.0.1' -Port '21050'
PS C:\> $row = Select-ApacheImpala -Connection $conn -Table "[CData].[Default].Customers" -Columns (City, CompanyName) -Where "Country = 'US'" | select -first 1
PS C:\> $row | ConvertTo-Json
{
  "Connection":  {

  },
  "Table":  "[CData].[Default].Customers",
  "Columns":  [

  ],
  "City":  "MyCity",
  "CompanyName":  "MyCompanyName"
} 

Modifying Data

The cmdlets make data transformation easy as well as data cleansing. The following example loads data from a CSV file into Apache Impala, checking first whether a record already exists and needs to be updated instead of inserted.

Import-Csv -Path C:\My[CData].[Default].CustomersUpdates.csv | %{
  $record = Select-ApacheImpala -Connection $conn -Table [CData].[Default].Customers -Where ("_id = `'"+$_._id+"`'")
  if($record){
    Update-ApacheImpala -Connection $conn -Table [CData].[Default].Customers -Columns @("City","CompanyName") -Values @($_.City, $_.CompanyName) -Where "_id  = `'$_._id`'"
  }else{
    Add-ApacheImpala -Connection $conn -Table [CData].[Default].Customers -Columns @("City","CompanyName") -Values @($_.City, $_.CompanyName)
  }
}

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 24.0.9060